分组，按时间计数，然后使用Pandas在组内排序

EventID Institution_Name TimeCreated 2021-03-22 15:34:46 40 H1 2021-03-22 18:17:19 40 H2 2021-03-22 20:37:47 40 H2 2021-03-22 20:40:20 40 H2 2021-03-22 21:37:32 40 H2 2021-03-22 22:16:32 40 H2 2021-03-22 23:19:49 40 H2 2021-03-22 23:26:40 40 H2 2021-03-23 00:26:03 40 H3 2021-03-23 01:25:43 40 H4 2021-03-23 04:00:24 40 H5 2021-03-23 13:09:42 40 H6 2021-03-23 13:13:23 40 H1 2021-03-23 15:49:33 40 H7 2021-03-23 17:22:30 40 H8 2021-03-23 17:22:37 40 H8 2021-03-23 17:23:49 40 H9 2021-03-23 18:19:56 40 H2 2021-03-23 18:22:14 40 H2 2021-03-23 18:52:36 40 H10

grouper = df.groupby([pd.Grouper(freq='1D'), 'Institution_Name']) grouper['EventID'].count().reset_index().sort_values(['TimeCreated'],ascending=True).sort_values('EventID', ascending=False).head(5) but this does not give the desired result.

3条回答

网友

1楼 · 编辑于 2024-09-27 07:26:37

按2列分组

grouper = df.groupby([pd.Grouper(key='TimeCreated', freq='1D'), 'Institution_Name'])

grouper = grouper.count().groupby('TimeCreated', group_keys=False)

对每组日期中的元素（计数）进行排序

grouper_count_desc = grouper.apply(lambda x: x.sort_values(by='EventID', ascending=False))

In[65]: grouper_count_desc
Out[65]: 
                              EventID
TimeCreated Institution_Name         
2021-03-22  H2                      7
            H1                      1
2021-03-23  H2                      2
            H8                      2
            H1                      1
            H10                     1
            H3                      1
            H4                      1
            H5                      1
            H6                      1
            H7                      1
            H9                      1

对日期组进行排序。每组中元素的顺序不会改变

grouper_date_asc = grouper_count_desc.sort_values(by='TimeCreated', ascending=True)

In[70]: grouper_date_desc = grouper_count_desc.sort_values(by='TimeCreated', ascending=False) # to show result, I used descending
In[71]: grouper_date_desc
Out[71]: 
                              EventID
TimeCreated Institution_Name         
2021-03-23  H2                      2
            H8                      2
            H1                      1
            H10                     1
            H3                      1
            H4                      1
            H5                      1
            H6                      1
            H7                      1
            H9                      1
2021-03-22  H2                      7
            H1                      1

重置索引并显示结果

print(grouper_date_asc.reset_index())

网友

2楼 · 编辑于 2024-09-27 07:26:37

可以使用^{}获取分组日期。按^{}聚合计数，然后对列进行排序，如下所示：

(df.groupby([df['TimeCreated'].dt.normalize(),
             'Institution_Name'])
   .agg(EventID_count=('EventID', 'count'))
   .reset_index()
   .sort_values(['TimeCreated', 'Institution_Name'], ascending=[True, False], ignore_index=True)
)

如果TimeCreated是索引，则可以使用df.index.normalize()，如下所示：

(df.groupby([df.index.normalize(),
             'Institution_Name'])
   .agg(EvenetID_count=('EventID', 'count'))
   .reset_index()
   .sort_values(['TimeCreated', 'Institution_Name'], ascending=[True, False], ignore_index=True)
)

结果：

   TimeCreated Institution_Name  EventID_count
0   2021-03-22               H2              7
1   2021-03-22               H1              1
2   2021-03-23               H9              1
3   2021-03-23               H8              2
4   2021-03-23               H7              1
5   2021-03-23               H6              1
6   2021-03-23               H5              1
7   2021-03-23               H4              1
8   2021-03-23               H3              1
9   2021-03-23               H2              2
10  2021-03-23              H10              1
11  2021-03-23               H1              1

您的代码实际上非常接近（因为TimeCreated是一个索引），只需更改列的排序方式，如下所示：

grouper = df.groupby([pd.Grouper(freq='1D'), 'Institution_Name'])
grouper['EventID'].count().reset_index().sort_values(['TimeCreated', 'Institution_Name'], ascending=[True, False], ignore_index=True)

这些代码的结果与上面相同，只是EventID的列名仍然是EventID，而不是EventID_count

网友

3楼 · 编辑于 2024-09-27 07:26:37

您可以使用^{}：

(df.groupby([df['TimeCreated'].dt.floor('d'),
             'Institution_Name'])
 [['EventID']].count()
 .add_suffix('_count')
 .sort_values(['TimeCreated', 'Institution_Name'], ascending=[True, False])
 .reset_index()
)

输出：

   TimeCreated Institution_Name  EventID_count
0   2021-03-22               H2              7
1   2021-03-22               H1              1
2   2021-03-23               H9              1
3   2021-03-23               H8              2
4   2021-03-23               H7              1
5   2021-03-23               H6              1
6   2021-03-23               H5              1
7   2021-03-23               H4              1
8   2021-03-23               H3              1
9   2021-03-23               H2              2
10  2021-03-23              H10              1
11  2021-03-23               H1              1

您最初的尝试不起作用，因为Grouper不知道在哪里可以找到您的日期（默认情况下，它使用索引）。这里有两种解决方法

定义列名：

(df.groupby([pd.Grouper(freq='1D', key='TimeCreated'),
             'Institution_Name'])
   [['EventID']].count()
   .add_suffix('_count')
   .sort_values(['TimeCreated', 'Institution_Name'], ascending=[True, False])
   .reset_index()
)

将列用作索引：

(df.set_index('TimeCreated')
   .groupby([pd.Grouper(freq='1D'),
             'Institution_Name'])
   [['EventID']].count()
   .add_suffix('_count')
   .sort_values(['TimeCreated', 'Institution_Name'], ascending=[True, False])
   .reset_index()
)

相关问题更多 >

编程相关推荐

热门问题

热门文章