奇异值分解在稀疏矩阵中的应用

2024-06-01 06:55:40 发布

您现在位置:Python中文网/ 问答频道 /正文

对稀疏矩阵进行奇异值分解时,应考虑哪些因素?你知道吗

这是一个非常稀疏的矩阵。我用0做了缺失插补。我还需要其他技巧吗?代码如下所示。你知道吗

import pandas as pd
import numpy as np
from sklearn.decomposition import TruncatedSVD
from sklearn.preprocessing  import Normalizer
from sklearn.metrics.pairwise import cosine_similarity

r_cols = ['user_id', 'movie_id', 'rating','xcx']
data = pd.read_csv('ml-100k/ua.test', sep='\t', names=r_cols, usecols=['user_id', 'movie_id', 'rating'], encoding='latin-1')
dtm = data.pivot(index='movie_id', columns='user_id', values='rating').fillna(0)
np.savetxt("pivot.csv", dtm, delimiter=",")

#without matrix factoriztion
cosine_sim = cosine_similarity(dtm, dtm)
np.savetxt("foo13.csv", cosine_sim, delimiter=",")

#with matrix factoriztion
lsa = TruncatedSVD(200, algorithm = 'arpack')
dtm_lsa = lsa.fit_transform(dtm)
dtm_lsa = Normalizer(copy = False).fit_transform(dtm_lsa)
similarity = np.asarray(np.asmatrix(dtm_lsa)*np.asmatrix(dtm_lsa).T)
np.savetxt("foo12.csv", similarity, delimiter=",")

如果我遗漏了任何问题,请随时指出。你知道吗


Tags: csvfromimportidnpsklearnmovierating