numpy返回一个3d数组内的索引问题的回答

numpy返回一个3d数组内的索引

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

方法1 下面是一个使用<a href="http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html" rel="nofollow">^{<cd1>}</a>- <pre><code>R,C = np.where((A[:,None,None] == B).any(-1)) out = np.split(C,np.flatnonzero(R[1:]>R[:-1])+1) </code></pre> 方法2 假设<code>A</code>和<code>B</code>都是正数，我们可以考虑它们来表示<code>2D</code>网格上的索引，这样{<cd3>}可以被视为按行保存列索引。一旦与<code>B</code>对应的<code>2D</code>网格就位，我们只需要考虑与<code>A</code>相交的列。最后，我们得到这样一个<code>2D</code>网格中<code>True</code>值的索引，从而给出<code>R</code>和{<cd12>}值。这应该更节省内存。在 因此，另一种方法应该是这样的- ^{pr2}$ 样本运行- <pre><code>In [43]: A Out[43]: array([0, 1, 2, 3]) In [44]: B Out[44]: array([[3, 2, 0], [0, 2, 1], [2, 3, 1], [3, 0, 1]]) In [45]: out Out[45]: [array([0, 1, 3]), array([1, 2, 3]), array([0, 1, 2]), array([0, 2, 3])] </code></pre> 运行时测试 按<code>100x</code>放大数据集大小，下面是一个快速的运行时测试结果- <pre><code>In [85]: def index_1din2d(A,B): ...: R,C = np.where((A[:,None,None] == B).any(-1)) ...: out = np.split(C,np.flatnonzero(R[1:]>R[:-1])+1) ...: return out ...: ...: def index_1din2d_initbased(A,B): ...: ncols = B.max()+1 ...: nrows = B.shape[0] ...: mask = np.zeros((nrows,ncols),dtype=bool) ...: mask[np.arange(nrows)[:,None],B] = 1 ...: mask[:,~np.in1d(np.arange(mask.shape[1]),A)] = 0 ...: R,C = np.where(mask.T) ...: out = np.split(C,np.flatnonzero(R[1:]>R[:-1])+1) ...: return out ...: In [86]: A = np.unique(np.random.randint(0,10000,(400))) ...: B = np.random.randint(0,10000,(400,300)) ...: In [87]: %timeit [np.where((B == x).sum(axis = 1))[0] for x in A] 1 loop, best of 3: 161 ms per loop # @Psidom's soln In [88]: %timeit index_1din2d(A,B) 10 loops, best of 3: 91.5 ms per loop In [89]: %timeit index_1din2d_initbased(A,B) 10 loops, best of 3: 33.4 ms per loop </code></pre> 性能进一步提升！ 或者，我们可以在第二种方法中以一种转置的方式创建<code>2D</code>网格。这个想法是为了避免<code>R,C = np.where(mask.T)</code>中的转置，这似乎是一个瓶颈。因此，第二种方法的修改版本和相关的运行时将如下所示- <pre><code>In [135]: def index_1din2d_initbased_v2(A,B): ...: nrows = B.max()+1 ...: ncols = B.shape[0] ...: mask = np.zeros((nrows,ncols),dtype=bool) ...: mask[B,np.arange(ncols)[:,None]] = 1 ...: mask[~np.in1d(np.arange(mask.shape[0]),A)] = 0 ...: R,C = np.where(mask) ...: out = np.split(C,np.flatnonzero(R[1:]>R[:-1])+1) ...: return out ...: In [136]: A = np.unique(np.random.randint(0,10000,(400))) ...: B = np.random.randint(0,10000,(400,300)) ...: In [137]: %timeit index_1din2d_initbased(A,B) 10 loops, best of 3: 57.5 ms per loop In [138]: %timeit index_1din2d_initbased_v2(A,B) 10 loops, best of 3: 25.9 ms per loop </code></pre>

numpy返回一个3d数组内的索引

1 个回答

相关Python问题