<p>这里有一个矢量化方法-</p>
<pre><code># Sort data w.r.t. col-0
data_sorted = data[data[:, 0].argsort()]
# Get counts of unique tags in col-0 of data and repeat seed accordingly.
# Thus, we would have an extended version of seed that matches data's shape.
seed_ext = np.repeat(seed,np.bincount(data_sorted[:,0]),axis=0)
# Get euclidean distances between extended seed version and sorted data
dists = np.sqrt(((data_sorted[:,1:] - seed_ext[:,1:])**2).sum(1))
# Get positions of shifts in col-0 of sorted data
shift_idx = np.append(0,np.nonzero(np.diff(data_sorted[:,0]))[0]+1)
# Final piece of puzzle is to get tag based maximum values from dists,
# where each tag is unique number in col-0 of data
diam_out = np.maximum.reduceat(dists,shift_idx)
</code></pre>
<p>运行时测试并验证输出-</p>
<p>定义函数:</p>
^{pr2}$
<p>验证输出:</p>
<pre><code>In [417]: # Inputs
...: seed = np.random.rand(20,20)
...: data = np.random.randint(0,20,(40000,20))
...:
In [418]: np.allclose(loopy_cdist(seed,data),vectorized_repeat_reduceat(seed,data))
Out[418]: True
In [419]: np.allclose(loopy_cdist(seed,data),vectorized_indexing_maxat(seed,data))
Out[419]: True
</code></pre>
<p>运行时:</p>
<pre><code>In [420]: %timeit loopy_cdist(seed,data)
10 loops, best of 3: 35.9 ms per loop
In [421]: %timeit vectorized_repeat_reduceat(seed,data)
10 loops, best of 3: 28.9 ms per loop
In [422]: %timeit vectorized_indexing_maxat(seed,data)
10 loops, best of 3: 24.1 ms per loop
</code></pre>