在用户之间平均预测时附加标签信息

>>> pred [ 0.99 0.23 0.11 0.64 0.45 0.55 0.76 0.72 0.97 ] >>> users ['User2' 'User3' 'User2' 'User3' 'User0' 'User1' 'User4' 'User4' 'User4'] >>> label [ 1 0 1 0 0 1 0 0 0 ] unq, idx, cnt = np.unique(user_data, return_inverse=True, return_counts=True) # assign integer indices to each unique user name, and get the total number of occurrences for each name predictions_user = np.bincount(idx, weights=pred) / cnt # now sum the values of pred corresponding to each index value and divide to get the mean

1条回答

网友

1楼 · 发布于 2024-09-27 07:28:47

可以通过将return_index=True传递给np.unique来实现这一点。从the docs：

return_index : bool, optional
If True, also return the indices of ar that result in the unique array.

这将为您提供user_data的一组索引，这些索引在unq中提供唯一的值。要获得unq中每个值对应的标签，只需使用这些索引索引labels：

unq, idx, inv_idx, cnt = np.unique(user_data, return_index=True,
                                   return_inverse=True, return_counts=True)

print(unq)
# ['User0' 'User1' 'User2' 'User3' 'User4']

print(label_user[idx])
# [0, 1, 1, 0, 0]

我已经将“逆”索引数组重命名为inv_idx，以便与idx区分开来

与计算每个唯一用户名的平均值一样，还有一种简单的方法可以使用pandas获得相应的标签：

import pandas as pd

df = pd.DataFrame({'user_data':user_data, 'label_user':label_user})
print(df.groupby('user_data').label_user.unique())
# user_data
# User0        [0]
# User1        [1]
# User2        [1]
# User3        [0]
# User4        [0]
# Name: label_user, dtype: object

相关问题更多 >

编程相关推荐

热门问题

热门文章