具有如下所示的数据帧数据
InsuranceId InsuranceStatus Date
0 Ins1234 DuePayment 2020-06-07 23:59:43.123456+00:00
1 Ins1234 Successful 2019-06-07 23:59:43.123456+00:00
2 Ins1234 Successful 2018-06-07 23:59:43.123456+00:00
3 Ins5678 DuePayment 2020-07-07 22:59:32.123421+00:00
4 Ins5678 Successful 2019-07-07 22:59:32.123421+00:00
5 Ins5678 Successful 2018-07-07 22:59:32.123421+00:00
正在尝试按InsuranceId和max(日期)基于组创建行数/排名
df['RowNum'] = df.groupby('InsuranceId')['InsuranceStatus']['Date'].rank(method="first", ascending=True)
and
df['RowNum'] = df.groupby(by=['InsuranceId'])['InsuranceStatus']['Date'].transform(lambda x: x.rank())
通过引用SQL-like window functions in PANDAS: Row Numbering in Python Pandas Dataframe
Error: Index Error: Columns status already selected
试图获得低于预期的产出
InsuranceId InsuranceStatus Date RowNum
0 Ins1234 DuePayment 2020-06-07 23:59:43.123456+00:00 1
1 Ins1234 Successful 2019-06-07 23:59:43.123456+00:00 2
2 Ins1234 Successful 2018-06-07 23:59:43.123456+00:00 3
3 Ins5678 DuePayment 2020-07-07 22:59:32.123421+00:00 1
4 Ins5678 Successful 2019-07-07 22:59:32.123421+00:00 2
5 Ins5678 Successful 2018-07-07 22:59:32.123421+00:00 3
有什么我想补充的吗。有什么建议吗
最终输出:
InsuranceId InsuranceStatus Date
Ins1234 DuePayment 2020-06-07 23:59:43.123456+00:00
Ins5678 DuePayment 2020-07-07 22:59:32.123421+00:00
使用^{} 。只需传递要分组的值,并对需要排序的列进行排序
输出:
相关问题 更多 >
编程相关推荐