Pandas:在列中计算相同的值,但来自不同的索引

2024-09-29 23:15:35 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据框,代表餐馆的顾客评级rating_year是评级的年份,first_year是餐厅开业的年份,last_year是餐厅的最后一个营业年度

  • 我想做的是计算与该餐厅在同一年开业的餐厅数量,因此使用相同的first_year

我在这里做的问题是,我将餐厅id和第一年进行分组并进行计数,但我不排除其他id相同的人我不知道这句话的语法。 有人能帮忙吗

data = {'rating_id': ['1', '2','3','4','5','6','7','8','9'],
        'user_id': ['56', '13','56','99','99','13','12','88','45'],
        'restaurant_id':  ['xxx', 'xxx','yyy','yyy','xxx','zzz','zzz','eee','eee'],
        'star_rating': ['2.3', '3.7','1.2','5.0','1.0','3.2','1.0','2.2','0.2'],
        'rating_year': ['2012','2012','2020','2001','2020','2015','2000','2003','2004'],
        'first_year': ['2012', '2012','2001','2001','2012','2000','2000','2001','2001'],
        'last_year': ['2020', '2020','2020','2020','2020','2015','2015','2020','2020'],
        }


df = pd.DataFrame (data, columns = ['rating_id','user_id','restaurant_id','star_rating','rating_year','first_year','last_year'])
df['star_rating'] = df['star_rating'].astype(float)

df['nb_rating'] = (
    df.groupby('restaurant_id')['rating_id'].transform('count')
)



#here
df['nb_opened_sameYear'] = (
    df.groupby('restaurant_id')['first_year']
    .transform('count')
)

df.head(10)

enter image description here


Tags: iddfdatayear餐厅restaurantxxxstar

热门问题