如何使用完整的数据集作为出其不意的列车集
我已经找到了一个我希望使用的过去的解决方案,但是我在构建推荐系统时遇到了一些问题,让我解释一下我的过程
正在加载我的数据集:
cols = ['UserId', 'Product','Rating']
reader = Reader()
datacolf = Dataset.load_from_df(datacf1[cols],reader)
然后我必须创建一个列车组:
trainSet = data.build_full_trainset()
我计算用户相似度:
sim_options = {'name': 'cosine', 'user_based': True}
model = KNNBasic(sim_options=sim_options)
model.fit(trainSet)
simsMatrix = model.compute_similarities()
然后,我必须为我的用户获得前N名建议:
testSubject = '1000115'
k =10
testUserInnerID = trainSet.to_inner_uid(testSubject)
similarityRow = simsMatrix[testUserInnerID]
问题是,在这个阶段,我得到了以下错误:
ValueError: User 1000115 is not part of the trainset.
我尝试使用我的完整数据集datacf1
,但问题是to_inner_uid
是Surprise
的Trainset class
的一部分,这只是我尝试使用的代码的开始,其他部分也在使用Trainset class
我正试图转换为能够使用我的数据的完整代码如下:
from MovieLens import MovieLens
from surprise import KNNBasic
import heapq
from collections import defaultdict
from operator import itemgetter
testSubject = '85'
k = 10
# Load our data set and compute the user similarity matrix
ml = MovieLens()
data = ml.loadMovieLensLatestSmall()
trainSet = data.build_full_trainset()
sim_options = {'name': 'cosine',
'user_based': True
}
model = KNNBasic(sim_options=sim_options)
model.fit(trainSet)
simsMatrix = model.compute_similarities()
# Get top N similar users to our test subject
# (Alternate approach would be to select users up to some similarity threshold - try it!)
testUserInnerID = trainSet.to_inner_uid(testSubject)
similarityRow = simsMatrix[testUserInnerID]
similarUsers = []
for innerID, score in enumerate(similarityRow):
if (innerID != testUserInnerID):
similarUsers.append( (innerID, score) )
kNeighbors = heapq.nlargest(k, similarUsers, key=lambda t: t[1])
# Get the stuff they rated, and add up ratings for each item, weighted by user similarity
candidates = defaultdict(float)
for similarUser in kNeighbors:
innerID = similarUser[0]
userSimilarityScore = similarUser[1]
theirRatings = trainSet.ur[innerID]
for rating in theirRatings:
candidates[rating[0]] += (rating[1] / 5.0) * userSimilarityScore
# Build a dictionary of stuff the user has already seen
watched = {}
for itemID, rating in trainSet.ur[testUserInnerID]:
watched[itemID] = 1
# Get top-rated items from similar users:
pos = 0
for itemID, ratingSum in sorted(candidates.items(), key=itemgetter(1), reverse=True):
if not itemID in watched:
movieID = trainSet.to_raw_iid(itemID)
print(ml.getMovieName(int(movieID)), ratingSum)
pos += 1
if (pos > 10):
break
因此,如果有人能帮助我解决如何将我的完整数据用作列车集,或者直接将完整的推荐代码转换为适用于我自己的数据,我将不胜感激
目前没有回答
相关问题 更多 >
编程相关推荐