如何标记DataFram中的最后一个重复元素

Id Policy_id Start_Date Last_dup 0 b123 2019/02/24 0 1 b123 2019/03/24 0 2 b123 2019/04/24 1 3 c123 2018/09/01 0 4 c123 2018/10/01 1 5 d123 2017/02/24 0 6 d123 2017/03/24 1

2条回答

网友

1楼 · 编辑于 2024-09-30 05:22:09

也可以用下面提到的方法完成（不使用Series.duplicated）：

dictionary = df[['Id','Policy_id']].set_index('Policy_id').to_dict()['Id']
#here the dictionary values contains the most recent Id's
df['Last_dup'] = df.Id.apply(lambda x: 1 if x in list(dictionary.values()) else 0)

网友

2楼 · 编辑于 2024-09-30 05:22:09

使用^{}或^{}指定列和参数keep='last'，然后将True/False到1/0映射的反向掩码转换为整数，或使用^{}：

df['Last_dup1'] = (~df['Policy_id'].duplicated(keep='last')).astype(int)
df['Last_dup1'] = np.where(df['Policy_id'].duplicated(keep='last'), 0, 1)

或：

df['Last_dup1'] = (~df.duplicated(subset=['Policy_id'], keep='last')).astype(int)
df['Last_dup1'] = np.where(df.duplicated(subset=['Policy_id'], keep='last'), 0, 1)

print (df)
   Id Policy_id  Start_Date  Last_dup  Last_dup1
0   0      b123  2019/02/24         0          0
1   1      b123  2019/03/24         0          0
2   2      b123  2019/04/24         1          1
3   3      c123  2018/09/01         0          0
4   4      c123  2018/10/01         1          1
5   5      d123  2017/02/24         0          0
6   6      d123  2017/03/24         1          1

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何标记DataFram中的最后一个重复元素

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >