我可以使用我的完整数据集作为推荐系统的培训集吗?

2024-10-01 00:20:02 发布

您现在位置:Python中文网/ 问答频道 /正文

如何使用完整的数据集作为出其不意的列车集

我已经找到了一个我希望使用的过去的解决方案,但是我在构建推荐系统时遇到了一些问题,让我解释一下我的过程

正在加载我的数据集:

cols = ['UserId', 'Product','Rating']
reader = Reader()
datacolf = Dataset.load_from_df(datacf1[cols],reader)

然后我必须创建一个列车组:

trainSet = data.build_full_trainset()

我计算用户相似度:

sim_options = {'name': 'cosine', 'user_based': True}

model = KNNBasic(sim_options=sim_options)
model.fit(trainSet)
simsMatrix = model.compute_similarities()

然后,我必须为我的用户获得前N名建议:

testSubject = '1000115'
k =10
testUserInnerID = trainSet.to_inner_uid(testSubject)
similarityRow = simsMatrix[testUserInnerID]

问题是,在这个阶段,我得到了以下错误:

ValueError: User 1000115 is not part of the trainset.

我尝试使用我的完整数据集datacf1,但问题是to_inner_uidSurpriseTrainset class的一部分,这只是我尝试使用的代码的开始,其他部分也在使用Trainset class

我正试图转换为能够使用我的数据的完整代码如下:

from MovieLens import MovieLens
from surprise import KNNBasic
import heapq
from collections import defaultdict
from operator import itemgetter
        
testSubject = '85'
k = 10

# Load our data set and compute the user similarity matrix
ml = MovieLens()
data = ml.loadMovieLensLatestSmall()

trainSet = data.build_full_trainset()

sim_options = {'name': 'cosine',
               'user_based': True
               }

model = KNNBasic(sim_options=sim_options)
model.fit(trainSet)
simsMatrix = model.compute_similarities()

# Get top N similar users to our test subject
# (Alternate approach would be to select users up to some similarity threshold - try it!)
testUserInnerID = trainSet.to_inner_uid(testSubject)
similarityRow = simsMatrix[testUserInnerID]

similarUsers = []
for innerID, score in enumerate(similarityRow):
    if (innerID != testUserInnerID):
        similarUsers.append( (innerID, score) )

kNeighbors = heapq.nlargest(k, similarUsers, key=lambda t: t[1])

# Get the stuff they rated, and add up ratings for each item, weighted by user similarity
candidates = defaultdict(float)
for similarUser in kNeighbors:
    innerID = similarUser[0]
    userSimilarityScore = similarUser[1]
    theirRatings = trainSet.ur[innerID]
    for rating in theirRatings:
        candidates[rating[0]] += (rating[1] / 5.0) * userSimilarityScore
    
# Build a dictionary of stuff the user has already seen
watched = {}
for itemID, rating in trainSet.ur[testUserInnerID]:
    watched[itemID] = 1
    
# Get top-rated items from similar users:
pos = 0
for itemID, ratingSum in sorted(candidates.items(), key=itemgetter(1), reverse=True):
    if not itemID in watched:
        movieID = trainSet.to_raw_iid(itemID)
        print(ml.getMovieName(int(movieID)), ratingSum)
        pos += 1
        if (pos > 10):
            break

因此,如果有人能帮助我解决如何将我的完整数据用作列车集,或者直接将完整的推荐代码转换为适用于我自己的数据,我将不胜感激


Tags: to数据infromimportfordatamodel