Python替代itertools产品和numpy

2条回答

网友

1楼 · 编辑于 2024-10-02 08:17:32

通过显式指定复合数据类型，可以避免numpy试图查找catchall数据类型时出现的一些问题：

代码+一些时间安排：

import numpy as np
import itertools

def cartesian_product_mixed_type(*arrays):
    arrays = *map(np.asanyarray, arrays),
    dtype = np.dtype([(f'f{i}', a.dtype) for i, a in enumerate(arrays)])
    out = np.empty((*map(len, arrays),), dtype)
    idx = slice(None), *itertools.repeat(None, len(arrays) - 1)
    for i, a in enumerate(arrays):
        out[f'f{i}'] = a[idx[:len(arrays) - i]]
    return out.ravel()

a = np.arange(4)
b = np.arange(*map(ord, ('A', 'D')), dtype=np.int32).view('U1')
c = np.arange(2.)

np.set_printoptions(threshold=10)

print(f'a={a}')
print(f'b={b}')
print(f'c={c}')

print('itertools')
print(list(itertools.product(a,b,c)))
print('numpy')
print(cartesian_product_mixed_type(a,b,c))

a = np.arange(100)
b = np.arange(*map(ord, ('A', 'z')), dtype=np.int32).view('U1')
c = np.arange(20.)

import timeit
kwds = dict(globals=globals(), number=1000)

print()
print(f'a={a}')
print(f'b={b}')
print(f'c={c}')

print(f"itertools: {timeit.timeit('list(itertools.product(a,b,c))', **kwds):7.4f} ms")
print(f"numpy:     {timeit.timeit('cartesian_product_mixed_type(a,b,c)', **kwds):7.4f} ms")

a = np.arange(1000)
b = np.arange(1000, dtype=np.int32).view('U1')

print()
print(f'a={a}')
print(f'b={b}')

print(f"itertools: {timeit.timeit('list(itertools.product(a,b))', **kwds):7.4f} ms")
print(f"numpy:     {timeit.timeit('cartesian_product_mixed_type(a,b)', **kwds):7.4f} ms")

样本输出：

^{pr2}$

网友

2楼 · 编辑于 2024-10-02 08:17:32

一般来说，如果我们把优化看作是一个平衡秤，那么内存和运行时间就是它的两个称重盘。也就是说，内存优化和运行时优化有着间接的联系（不总是，但大多数时候）。现在，关于你的问题：

Is there a way to create same object with numpy which works faster than itertools?

当然有，但是你需要注意的另一点是抽象会给你更大的灵活性，itertools.product给你和Numpy没有的。如果可伸缩性在这个例子中不是一个重要的因素，你可以用Numpy来做，而且不要放弃任何好处。以下是使用column_stack、repeat和tile函数的一种方法：

In [5]: np.column_stack((np.repeat(a, b.size),np.tile(b, a.size)))
Out[5]: 
array([['1', 'a'],
       ['1', 'b'],
       ['1', 'c'],
       ['2', 'a'],
       ['2', 'b'],
       ['2', 'c'],
       ['3', 'a'],
       ['3', 'b'],
       ['3', 'c']], dtype='<U21')

现在，仍然有一些方法可以通过使用诸如U2，U1等更轻量级的类型来使这个数组占用更少的内存

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python替代itertools产品和numpy

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >