从头开始实现用户-用户协同过滤

2024-09-27 04:11:40 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试根据AnalyticsVidhya的tutorial{a2}公式在MovieLens-100k数据集上实现用户-用户协同过滤算法。我在使用NumPy对算法进行矢量化时遇到问题。近两天来,我一直在努力了解矩阵维度。我所做的:

  1. 导入MovieLens数据集并创建用户项矩阵。有943个独特的用户和1682部独特的电影

  2. 为了清晰起见,我将矩阵切分为前10个用户和5个用户用户名矩阵

  3. 的电影。
tst_rating = user_item_matrix[0:10,0:5]
  1. 我将评级集中在0左右:
tot_ratings_per_user = np.sum(tst_rating,axis=1)
num_ratings_per_user = ((tst_rating != 0).sum(1))
avg_rating_per_user = np.divide(tot_ratings_per_user,num_ratings_per_user,out=np.zeros_like(tot_ratings_per_user),where=num_ratings_per_user != 0)
avg_rating_per_user = np.reshape(avg_rating_per_user,(avg_rating_per_user.shape[0],-1))
tst_rating = tst_rating - avg_rating_per_user
  1. 计算中心评级周围的余弦相似性:
user_sim = 1 - pairwise_distances(tst_rating, metric='cosine')
  1. 尝试按用户1预测所有项目的评分:
def predict_rating(user,tst_rating,user_sim):
    print('--------------------Calculation of rating predictions for user {}--------------------'.format(user))
    u_id = user - 1
    user_idxs = np.arange(tst_rating.shape[0])
    user_idxs = np.delete(user_idxs,u_id,axis=0)
    num_other_users = user_idxs.shape[0]

    A = user_sim[u_id,user_idxs]
    A = np.reshape(A,(-1,A.shape[0]))
    print('User similarity {} between user {} and rest of the users:\n{}'.format(A.shape,user,A))
    input()
    B = tst_rating[user_idxs,:]
    print('Ratings {} for all items for all users except user {}\n{}'.format(B.shape,user,B))

    input()
    numer = np.dot(A,B)
    denom = A * num_other_users

    print('NUMERATOR {} = {} x \n{} = \n{}'.format(numer.shape,A,B,numer))
    print('DENOMINATOR {} = {} x \n{} = \n{}'.format(denom.shape,A,num_other_users,denom))
    input()
    user_ratings = np.divide(numer,denom,out=np.zeros_like(numer),where=denom != 0)
    print('NUMERATOR/DENOMINATOR = {}'.format(user_ratings))


predict_rating(1,tst_rating,user_sim)

问题在于,它在user_ratings np.divide()步骤中抱怨分子和分母的矩阵维数不匹配:

user_ratings = np.divide(numer,denom,out=np.zeros_like(numer),where=denom != 0)
ValueError: operands could not be broadcast together with shapes (1,5) (1,9) (1,5) (1,9)

因为分子的形状是(1,5),分母的形状是(1,9)。我不知道我到底做错了什么,让它按照给定的公式以矢量化的格式计算出来。我真的很感谢对这件事的任何见解/帮助/指导


Tags: 用户formatnp矩阵numprintshaperating

热门问题