大Pandas若干价值观系列的合并与总结

# month 1 id userId actionType 1 1 a 2 1 c 3 2 a 4 3 a 5 3 b # month 2 id userId actionType 6 1 b 7 1 b 8 2 a 9 3 c

# count users actions and remember them in new column df1['count'] = df1.groupby(['userId'], sort=False)['id'].transform('count') # delete not necessary columns df1 = df1[['userId', 'count']] # delete not necessary rows df1 = df1.drop_duplicates(subset=['userId']) # repeat df2['count'] = df2.groupby(['userId'], sort=False)['id'].transform('count') df2 = df2[['userId', 'count']] df2 = df2.drop_duplicates(subset=['userId']) # merge and sum up print pd.concat([df1,df2]).groupby(['userId'], sort=False).sum()

3条回答

网友

1楼 · 编辑于 2024-06-28 19:23:22

您可以直接对value_counts方法生成的series求和：

#create frames
df= pd.DataFrame({'User_id': ['a','a','b','c','c'],'a':[1,1,2,3,3]})
df1= pd.DataFrame({'User_id': ['a','a','b','b','c','c','c'],'a':[1,1,2,2,3,3,4]})

对系列求和：

^{pr2}$

输出：

a    4
b    3
c    5
dtype: int64

网友

2楼 · 编辑于 2024-06-28 19:23:22

这就是所谓的“分离-应用-联合”。使用如下lambda函数，只需1行和3-4次单击即可完成。

1️⃣将此粘贴到代码中：

df['total_for_this_label'] = df.groupby('label', as_index=False)['label'].transform(lambda x: x.count())

2️⃣将3xlabel替换为正在计算其值的列的名称（区分大小写）

3️⃣打印测向头（）检查它是否正常工作

网友

3楼 · 编辑于 2024-06-28 19:23:22

我建议使用“add”并指定填充值0。与前面建议的答案相比，这有一个优势，即当两个数据帧具有不相同的唯一键集时，它将起作用。在

# Create frames
df1= pd.DataFrame({'User_id': ['a','a','b','c','c','d'],'a':[1,1,2,3,3,5]})
df2= pd.DataFrame({'User_id': ['a','a','b','b','c','c','c'],'a' [1,1,2,2,3,3,4]})

现在添加两组值\u counts（）。fill\u value参数将处理任何出现的NaN值，在本例中，是出现在df1中的'd'，而不是df2。在

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章