如何从多个数组中构造多个向量

ids = 1 2 3 4 5 ------------------------------ dataset = [(0.13, 2.05, null, null, null), (null, 0.23, null, 7.35, 5.60), (null, 0.61, 4.45, null, null)]

2条回答

网友

1楼 · 编辑于 2024-10-03 11:14:14

这里有一个基于NumPy的方法来创建一个稀疏矩阵^{}，重点是内存效率-

from scipy.sparse import coo_matrix

# Construct row IDs
lens = np.array([len(item) for item in dataset])
shifts_arr = np.zeros(lens.sum(),dtype=int)
shifts_arr[lens[:-1].cumsum()] = 1
row = shifts_arr.cumsum()

# Extract values from dataset into a NumPy array
arr = np.concatenate(dataset)

# Get the unique column IDs to be used for col-indexing into output array
col = np.unique(arr[:,0],return_inverse=True)[1]

# Determine the output shape
out_shp = (row.max()+1,col.max()+1)

# Finally create a sparse marix with the row,col indices and col-2 of arr
sp_out = coo_matrix((arr[:,1],(row,col)), shape=out_shp)

请注意，如果IDs应该是输出数组中的列号，那么您可以用这样的方法替换{}的用法，它给我们提供了这样一个惟一的id-

^{pr2}$

这会给我们带来很好的性能提升！在

样本运行-

In [264]: dataset = [[(1, 0.13), (2, 2.05)],
     ...:            [(2, 0.23), (4, 7.35), (5, 5.60)],
     ...:            [(2, 0.61), (3, 4.45)]]

In [265]: sp_out.todense() # Using .todense() to show output
Out[265]: 
matrix([[ 0.13,  2.05,  0.  ,  0.  ,  0.  ],
        [ 0.  ,  0.23,  0.  ,  7.35,  5.6 ],
        [ 0.  ,  0.61,  4.45,  0.  ,  0.  ]])

网友

2楼 · 编辑于 2024-10-03 11:14:14

您可以将数据集中的每个元素转换为字典，然后使用pandas数据帧，它将返回接近所需输出的结果。如果需要2Dnumpy数组，我们可以使用as_matrix()方法将数据帧转换为numpy数组：

import pandas as pd
pd.DataFrame(dict(x) for x in dataset).as_matrix()

# array([[ 0.13,  2.05,   nan,   nan,   nan],
#        [  nan,  0.23,   nan,  7.35,  5.6 ],
#        [  nan,  0.61,  4.45,   nan,   nan]])

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何从多个数组中构造多个向量

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >