按col3分组的等效选择计数（distinct col1，col2）

people = ['shayna','shayna','shayna','shayna','john'] dates = ['01-01-18','01-01-18','01-01-18','01-02-18','01-02-18'] places = ['hospital', 'hospital', 'inpatient', 'hospital', 'hospital'] d = {'Person':people,'Service_Date':dates, 'Site_Where_Served':places} df = pd.DataFrame(d) df Person Service_Date Site_Where_Served shayna 01-01-18 hospital shayna 01-01-18 hospital shayna 01-01-18 inpatient shayna 01-02-18 hospital john 01-02-18 hospital

3条回答

网友

1楼 · 编辑于 2024-10-02 08:29:24

在我看来，更好的方法是在使用groupby.size之前删除重复项：

res = df.drop_duplicates()\
        .groupby('Site_Where_Served').size()\
        .reset_index(name='Site_Visit_Count')

print(res)

  Site_Where_Served  Site_Visit_Count
0          hospital                 3
1         inpatient                 1

网友

2楼 · 编辑于 2024-10-02 08:29:24

也许value_counts

(df.drop_duplicates()
   .Site_Where_Served
   .value_counts()
   .to_frame('Site_Visit_Count')
   .rename_axis('Site_Where_Served')
   .reset_index()
)

  Site_Where_Served  Site_Visit_Count
0          hospital                 3
1         inpatient                 1

网友

3楼 · 编辑于 2024-10-02 08:29:24

`drop_duplicates`与`groupby`+`count`

(df.drop_duplicates()
   .groupby('Site_Where_Served')
   .Site_Where_Served.count()
   .reset_index(name='Site_Visit_Count')
)

  Site_Where_Served  Site_Visit_Count
0          hospital                 3
1         inpatient                 1

注意，count/size之间的一个微小区别是前者不计算NaN条目。你知道吗

元组化，`groupby`和`nunique`

这实际上只是修复您当前的解决方案，但我不建议这样做，因为这是相当冗长的步骤比必要的多。首先，对列进行tuplize，按Site_Where_Served分组，然后计数：

(df[['Person', 'Service_Date']]
   .apply(tuple, 1)
   .groupby(df.Site_Where_Served)
   .nunique()
   .reset_index(name='Site_Visit_Count')
)

  Site_Where_Served  Site_Visit_Count
0          hospital                 3
1         inpatient                 1

`drop_duplicates`与`groupby`+`count`

元组化，`groupby`和`nunique`

相关问题更多 >

编程相关推荐

热门问题

热门文章