<p>看起来是<code>np.unique</code>的工作</p>
<pre><code>uniq, inv = np.unique(x, return_inverse=True)
result = np.zeros((len(x), len(uniq)), dtype=int)
result[np.arange(len(x)), inv] = 1
</code></pre>
<p>针对@Divakar的基准测试:这里有一个信息更丰富的比较,证实了<code>dv</code>在小字母表中的一个轻微的速度优势,它在<code>K=20</code>附近交叉,在<code>K=1000</code>处,它又反过来成为{<cd4>}的几倍优势。这是预期的,因为<code>pp</code>利用了一个热的稀疏性。下面,<em>K</em>是字母表的大小,<em>N</em>是样本的长度。在</p>
^{pr2}$
<p>印刷品:</p>
<pre><code>@ K = 4
dv: 0.003458, 0.038176, 0.421894 secs for 100 trials @ N = 1000, 10000, 100000
pp: 0.004856, 0.052298, 0.603758 secs for 100 trials @ N = 1000, 10000, 100000
@ K = 10
dv: 0.005136, 0.056491, 0.663157 secs for 100 trials @ N = 1000, 10000, 100000
pp: 0.005955, 0.054069, 0.719152 secs for 100 trials @ N = 1000, 10000, 100000
@ K = 20
dv: 0.007201, 0.084867, 0.988886 secs for 100 trials @ N = 1000, 10000, 100000
pp: 0.007638, 0.084580, 0.891122 secs for 100 trials @ N = 1000, 10000, 100000
@ K = 40
dv: 0.010748, 0.130974, 1.498022 secs for 100 trials @ N = 1000, 10000, 100000
pp: 0.009321, 0.103912, 1.080271 secs for 100 trials @ N = 1000, 10000, 100000
@ K = 100
dv: 0.025357, 0.292930, 2.946326 secs for 100 trials @ N = 1000, 10000, 100000
pp: 0.011916, 0.147117, 1.641588 secs for 100 trials @ N = 1000, 10000, 100000
@ K = 200
dv: 0.033651, 0.560753, 6.042001 secs for 100 trials @ N = 1000, 10000, 100000
pp: 0.022971, 0.221142, 3.580255 secs for 100 trials @ N = 1000, 10000, 100000
@ K = 1000
dv: 0.156715, 2.655647, 37.112166 secs for 100 trials @ N = 1000, 10000, 100000
pp: 0.055516, 0.920938, 10.358050 secs for 100 trials @ N = 1000, 10000, 100000
</code></pre>
<p>使用<code>uint8</code>并允许@Divakar的方法使用更便宜的视图转换:</p>
<pre><code>@ K = 4
dv: 0.003092, 0.038149, 0.386140 secs for 100 trials @ N = 1000, 10000, 100000
pp: 0.004392, 0.043327, 0.554253 secs for 100 trials @ N = 1000, 10000, 100000
@ K = 10
dv: 0.004604, 0.054215, 0.501708 secs for 100 trials @ N = 1000, 10000, 100000
pp: 0.004930, 0.051555, 0.607239 secs for 100 trials @ N = 1000, 10000, 100000
@ K = 20
dv: 0.006421, 0.067397, 0.665465 secs for 100 trials @ N = 1000, 10000, 100000
pp: 0.006616, 0.054055, 0.703260 secs for 100 trials @ N = 1000, 10000, 100000
@ K = 40
dv: 0.008857, 0.087155, 0.862316 secs for 100 trials @ N = 1000, 10000, 100000
pp: 0.006945, 0.060408, 0.733966 secs for 100 trials @ N = 1000, 10000, 100000
@ K = 100
dv: 0.015660, 0.142464, 1.426929 secs for 100 trials @ N = 1000, 10000, 100000
pp: 0.008063, 0.070860, 0.908615 secs for 100 trials @ N = 1000, 10000, 100000
@ K = 200
dv: 0.025631, 0.235712, 2.401750 secs for 100 trials @ N = 1000, 10000, 100000
pp: 0.008805, 0.101772, 1.111652 secs for 100 trials @ N = 1000, 10000, 100000
@ K = 1000
dv: 0.069953, 1.024585, 11.313402 secs for 100 trials @ N = 1000, 10000, 100000
pp: 0.011558, 0.182684, 2.201837 secs for 100 trials @ N = 1000, 10000, 100000
</code></pre>