将稀疏矩阵分解为列和列

2024-09-28 01:32:51 发布

您现在位置:Python中文网/ 问答频道 /正文

嗨,我有一个稀疏的csr矩阵是这样构建的:

userid = list(np.sort(matrix.USERID.unique()))  # Get our unique customers
artid = list(matrix.ARTID.unique())  # Get our unique products that were purchased
click = list(matrix.TOTALCLICK)

rows = pd.Categorical(matrix.USERID, categories=userid).codes

# Get the associated row indices
cols = pd.Categorical(matrix.ARTID, categories=artid).codes

# Get the associated column indices
item_sparse = sparse.csr_matrix((click, (rows, cols)), shape=(len(userid), len(artid)))

原始的matrix包含用户与网站上产品的交互

最后我得到了一个稀疏矩阵

  (0, 4136) 1
  (0, 5553) 1
  (0, 9089) 1
  (0, 24104) 3
  (0, 28061) 2
  (1, 0)    2
  (1, 224)  1
  (1, 226)  1
  (1, 324)  2
  (1, 341)  1
  (1, 530)  1
  (1, 642)  1
  (1, 658)  1

我如何根据这个稀疏矩阵按第一个索引(用户)分组,并假设训练集的前80%行和测试集的其他20%。我应该得到两个矩阵

培训:

  (0, 4136) 1
  (0, 5553) 1
  (0, 9089) 1
  (1, 0)    2
  (1, 224)  1
  (1, 226)  1
  (1, 324)  2
  (1, 341)  1
  (1, 530)  1

测试:

  (0, 24104)    3
  (0, 28061)    2
  (1, 642)      1
  (1, 658)      1

Tags: getour矩阵matrixlistrowspdclick
2条回答

使用sklearn api train\u test\u split你将给这个方法3个参数你的矩阵分裂的比率和随机状态。如果您想以相同的结果再次拆分,随机状态非常有用

您可以使用StratifiedShuffleSplit(或者StratifiedKFold如果您不想洗牌,但是您需要进行5次拆分以获得80%/20%的训练/测试拆分,因为您无法通过其他方式控制测试大小。)类在scikit学习:

import sklearn.model_selection
import numpy as np

# Array similar to your structure
x = np.asarray([[0,4136,1],[0,5553,1],[0,9089,1],[1,0,2], \
                [1,224,1],[1,226,1],[1,324,2],[1,341,1],[1,530,1]])
# Get train and test indices using x[:,0] to define the 'classes'
cv = sklearn.model_selection.StratifiedShuffleSplit(n_splits=1, test_size=0.2)
# Note, X isn't actually used in the method, np.zeros(n_samples) would also work
# Also note that cv.split is an iterator with 1 element (split), 
# hence getting the first element of the list
train_idx, test_idx = list(cv.split(X=x, y=x[:,0]))[0]

print("Training")
for i in train_idx:
    print(x[i,:2], x[i,2])
print("Test")
for i in test_idx: 
    print(x[i,:2], x[i,2])

我对稀疏矩阵没有太多的经验,所以我希望您可以根据我的示例进行必要的调整

相关问题 更多 >

    热门问题