<p>你几乎可以用<a href="http://docs.scipy.org/doc/numpy/reference/generated/numpy.einsum.html" rel="noreferrer">^{<cd1>}</a>得到任何你梦寐以求的东西。直到你开始掌握它的窍门,它基本上看起来像黑色伏都教。。。</p>
<pre><code>>>> a = np.arange(15).reshape(5, 3)
>>> b = np.arange(9).reshape(3, 3)
>>> np.diag(np.dot(np.dot(a, b), a.T))
array([ 60, 672, 1932, 3840, 6396])
>>> np.einsum('ij,ji->i', np.dot(a, b), a.T)
array([ 60, 672, 1932, 3840, 6396])
>>> np.einsum('ij,ij->i', np.dot(a, b), a)
array([ 60, 672, 1932, 3840, 6396])
</code></pre>
<p><strong>编辑</strong>实际上,你可以在一次拍摄中获得全部内容,这太荒谬了。。。</p>
<pre><code>>>> np.einsum('ij,jk,ki->i', a, b, a.T)
array([ 60, 672, 1932, 3840, 6396])
>>> np.einsum('ij,jk,ik->i', a, b, a)
array([ 60, 672, 1932, 3840, 6396])
</code></pre>
<p><strong>编辑</strong>您不想让它自己计算太多。。。还为自己的问题添加了OP的答案以供比较。</p>
<pre><code>n, p = 10000, 200
a = np.random.rand(n, p)
b = np.random.rand(p, p)
In [2]: %timeit np.einsum('ij,jk,ki->i', a, b, a.T)
1 loops, best of 3: 1.3 s per loop
In [3]: %timeit np.einsum('ij,ij->i', np.dot(a, b), a)
10 loops, best of 3: 105 ms per loop
In [4]: %timeit np.diag(np.dot(np.dot(a, b), a.T))
1 loops, best of 3: 5.73 s per loop
In [5]: %timeit (a.dot(b) * a).sum(-1)
10 loops, best of 3: 115 ms per loop
</code></pre>