用pandas创建稀疏矩阵，并用.dat文件中索引[x，y]的一列中的值填充它

import csv import numpy as np f = open("train.dat", "rt") reader = csv.reader(f, delimiter="\t") next(reader) data = [d for d in reader] f.close() data = np.array(data, dtype=float) col = int(a[:,0].max()) + 1 row = int(a[:,1].max()) + 1 empty = np.empty((row, col)) empty[:] = np.nan for d in data: empty[int(d[0]), int(d[1])] = d[2]

userID artistID weight 45 7 0.7114779874213837 204 144 0.46399999999999997 36 650 2.4232887490165225 140 146 1.0146699266503667 170 31 1.4124783362218372 240 468 0.6529992406985573

1条回答

网友

1楼 · 发布于 2024-06-03 10:53:24

将数据复制到文件：

In [290]: data = pd.read_csv('stack48133358.txt',delim_whitespace=True)
In [291]: data
Out[291]: 
   userID  artistID    weight
0      45         7  0.711478
1     204       144  0.464000
2      36       650  2.423289
3     140       146  1.014670
4     170        31  1.412478
5     240       468  0.652999
In [292]: M = sparse.csr_matrix((data.weight, (data.userID, data.artistID)))
In [293]: M
Out[293]: 
<241x651 sparse matrix of type '<class 'numpy.float64'>'
    with 6 stored elements in Compressed Sparse Row format>
In [294]: print(M)
  (36, 650)     2.42328874902
  (45, 7)       0.711477987421
  (140, 146)    1.01466992665
  (170, 31)     1.41247833622
  (204, 144)    0.464
  (240, 468)    0.652999240699

我还可以使用genfromtxt加载该文件：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章