不支持嵌套重命名程序,如何重新生成此代码?

2024-10-02 00:22:19 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在编写此代码,但出现以下错误:

这是我在https://github.com/statisticianinstilettos/recmetrics/blob/master/example.ipynb找到的代码:

!pip install scipy
!pip install git+https://github.com/statisticianinstilettos/recmetrics

import pandas as pd
import numpy as np

import recmetrics
import matplotlib.pyplot as plt
from surprise import Reader, SVD, Dataset
from surprise.model_selection import train_test_split

from tensorflow import keras
from tensorflow.keras import layers
from pathlib import Path
from zipfile import ZipFile


# Download the actual data from http://files.grouplens.org/datasets/movielens/ml-latest-small.zip"
# Use the ratings.csv file
movielens_data_file_url = (
    "http://files.grouplens.org/datasets/movielens/ml-latest-small.zip"
)
movielens_zipped_file = keras.utils.get_file(
    "ml-latest-small.zip", movielens_data_file_url, extract=False
)
keras_datasets_path = Path(movielens_zipped_file).parents[0]
movielens_dir = keras_datasets_path / "ml-latest-small"

# Only extract the data the first time the script is run.
if not movielens_dir.exists():
    with ZipFile(movielens_zipped_file, "r") as zip:
        # Extract files
        print("Extracting all the files now...")
        zip.extractall(path=keras_datasets_path)
        print("Done!")

ratings_file = movielens_dir / "ratings.csv"
ratings = pd.read_csv(ratings_file)


ratings = ratings.query('rating >=3')
ratings.reset_index(drop=True, inplace=True)

#only consider ratings from users who have rated over n movies
n=1000
users = ratings.userId.value_counts()
users = users[users>n].index.tolist()

ratings = ratings.query('userId in @users')
print(ratings.shape)
ratings.head(3)

# get movie features

rated_movies = ratings.movieId.tolist()

movies_file = movielens_dir / "movies.csv"
movies = pd.read_csv(movies_file)
movies = movies.query('movieId in @rated_movies')
movies.set_index("movieId", inplace=True, drop=True)

movies = movies.genres.str.split("|", expand=True)
movies.reset_index(inplace=True)
movies = pd.melt(movies, id_vars='movieId', value_vars=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

movies.drop_duplicates("movieId", inplace=True)
movies.set_index('movieId', inplace=True)

movies = pd.get_dummies(movies.value)
#movies = movies[['Action', 'Romance', 'Western', 'Comedy', 'Crime']]
movies.head()

import matplotlib.pyplot as plt

fig = plt.figure(figsize=(15, 7))
recmetrics.long_tail_plot(df=ratings, 
             item_id_column="movieId", 
             interaction_type="movie ratings", 
             percentage=0.5,
             x_labels=False)

#format data for surprise
reader = Reader(rating_scale=(0, 5))
data = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']], reader)
trainset, testset = train_test_split(data, test_size=0.25)

#train SVD recommender 
algo = SVD()
algo.fit(trainset)
#make predictions on test set. 
test = algo.test(testset)
test = pd.DataFrame(test)
test.drop("details", inplace=True, axis=1)
test.columns = ['userId', 'movieId', 'actual', 'cf_predictions']
test.head()
#evaluate model with MSE and RMSE
print(recmetrics.mse(test.actual, test.cf_predictions))
print(recmetrics.rmse(test.actual, test.cf_predictions))

def get_users_predictions(user_id, n, model):
    recommended_items = pd.DataFrame(model.loc[user_id])
    recommended_items.columns = ["predicted_rating"]
    recommended_items = recommended_items.sort_values('predicted_rating', ascending=False)    
    recommended_items = recommended_items.head(n)
    return recommended_items.index.tolist()
#get example prediction
get_users_predictions(274, 10, cf_model)
 # Error
 test = test.copy().groupby('userId')['movieId'].agg({'actual': (lambda x: list(set(x)))})

错误:

---------------------------------------------------------------------------
SpecificationError                        Traceback (most recent call last)
<ipython-input-36-91830a4ef799> in <module>()
      1 #format test data
----> 2 test = test.copy().groupby('userId')['movieId'].agg({'actual': (lambda x: list(set(x)))})
      3 
      4 
      5 

1 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/groupby/generic.py in _aggregate_multiple_funcs(self, arg)
    292             # GH 15931
    293             if isinstance(self._selected_obj, Series):
--> 294                 raise SpecificationError("nested renamer is not supported")
    295 
    296             columns = list(arg.keys())

SpecificationError: nested renamer is not supported

我读到括号{}不再受支持。语法已更改,不幸的是,我找不到可以应用于我的问题的解决方案。我如何解决这个问题


Tags: fromtestimporttruedataitemsmoviesusers
1条回答
网友
1楼 · 发布于 2024-10-02 00:22:19

使用命名聚合:

test = pd.DataFrame({
         'movieId':[5,3,3,9,2,4,9],
         'userId':list('aaabbbb')
})

test = test.groupby('userId').agg(actual = ('movieId', lambda x: list(set(x))))   
print (test)
           actual
userId           
a          [3, 5]
b       [9, 2, 4]

或使用元组列表:

test = test.groupby('userId')['movieId'].agg([('actual', lambda x: list(set(x)))])
print (test)
           actual
userId           
a          [3, 5]
b       [9, 2, 4]

相关问题 更多 >

    热门问题