洗牌大稀疏矩阵

2024-05-19 07:57:44 发布

您现在位置：Python中文网/ 问答频道 /正文

4227

网友

男 | 程序猿一只，喜欢编程写python代码。

我处理大型矩阵有困难。故事是这样的：

我有一个大矩阵（行x列高达2000万x 2000万）。你知道吗
因为矩阵有稀疏行，所以我使用scipy稀疏csr矩阵来存储矩阵。你知道吗
在我的主算法中，有一部分我需要从这个矩阵中随机抽取一组行（比如1000行）。你知道吗

如果列的数目太大，我不能同时提取所有行

# get a random batch of size b
index = random.sample(range(n), b)
X_batch = X[index]

我目前的解决方案是分批提取，并在此基础上进行计算：

# get a random batch of size b
index = random.sample(range(n), b)

# calculate number of batches
total_mem_batch = 1e9 # a large number represent the total available memory

batch_size = int(total_mem_batch // nnzX) # nnz is average number of nonzero per row
num_batches = math.ceil(b / batch_size)
result = np.zeros(d)

for j in range(num_batches): 
    # calculate start/end indices for each batch
    startIdx = batch_size*j
    endIdx = np.minimum(batch_size*(j+1), b)

    batch_X = X[index[startIdx:endIdx],:]
    batch_Y = Y[index[startIdx:endIdx]]
    batch_bias = bias[index[startIdx:endIdx]]

    # do main operation
    result += ...

现在的瓶颈是检索矩阵的一组行。由于索引数组是无序的，因此可以将其视为对输入矩阵X的行的随机访问。因此，它比顺序读取慢得多。你知道吗

我的问题是：有没有办法通过

每隔一段时间对输入矩阵进行一次洗牌（行的顺序并不重要，因此可以就地洗牌），以便以后按顺序读取元素，或者
有没有更快的方法随机访问一个大矩阵的行？

谢谢你看我的帖子。你知道吗

最好的

Tags： of sample number size get index 顺序 batch

0条回答

目前没有回答

洗牌大稀疏矩阵

相关问题更多 >

编程相关推荐

热门问题

热门文章

洗牌大稀疏矩阵

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >