我的数据在下面的数据框中
df = pd.DataFrame({'AccID':['001','001','001','002','002','003'],
'AccTypes':['A','B','C','A','B','C'],
'Status':['Closed','Active','Active','Active','Closed','Active'],
'Years':[5,15,10,20,25,30]})
AccID AccTypes Status Years
001 A Closed 5
001 B Active 15
001 C Active 10
002 A Active 20
002 B Closed 25
003 C Active 30
我想创建另一个名为“ActiveYears”的列,每个值都是给定的活动账户的最大活动年数,而不管账户类型如何。预期输出如下:
AccID AccTypes Status Years ActiveYears Explanations
001 A Closed 5 5 # Status = Closed, we set ActiveYears = Years
001 B Active 15 15 # Status = Active, we select the maximum year of AccID = 001 with active status
001 C Active 10 15 # Status = Active, we select the maximum year of AccID = 001 with active status
002 A Active 20 20 # Status = Active, we select the maximum year of AccID = 002 with active status
002 B Closed 25 20 # Status = Closed, we set ActiveYears = Years
003 C Active 30 30 # Status = Active, we select the maximum year of AccID = 003 with active status
我可以做这个循环,但它不够优雅。我能知道怎样做比循环更好吗?非常感谢。你知道吗
您可以使用以下选项:
首先处理状态
Closed
:使用groupby transformation处理活动的:
相关问题 更多 >
编程相关推荐