我需要为熊猫群比设定一些规则。如果['keep']列在按日期时间分组之前有“dup by”,我希望可以忽略这些行
这是我的代码:
import pandas as pd
import numpy as np
df = pd.read_csv("sample.csv",delimiter='|')
df['datetime'] = pd.to_datetime(df['datetime'],errors = 'coerce')
most_recent_date = df.groupby(df['VIP_ID'])['datetime'].max()
most_recent_date= most_recent_date.rename("most_recent_date")
df = df.join(most_recent_date, on="VIP_ID")
df['both'] = np.where(
((df['keep'] == 'same tier')&(dup == 'yes')),
df['VIP_ID']+df['datetime'].astype(str),
df['ID']
)
df['keep'] = np.where(
df['keep'] != 'same tier',df['keep'],
(np.where(
df['most_recent_date'] == df['datetime'],
'yes',
'dup by ' + df['VIP_ID'].astype(str)))
)
df.loc[df.duplicated(subset=['both'], keep = False),'keep'] = 'same time'
df = df.drop(columns = ['both','most_recent_date'])
print(df)
这段代码使所有keep列都变成“dup by”
示例csv:
ID|VIP_ID|TIER|datatime|keep
1|F08210020403|GO|2014-05-17 00:00:00|same tier
2|F08210020403|GO|2014-04-18 00:00:00|same tier
3|F08210020403|FO||dup by F08210020403
4|F08210020403|FO||dup by F08210020403
5|F08210020403|FO|2016-09-18 00:00:00|dup by F08210020403
6|F08210020403|FO|2016-05-10 00:00:00|dup by F08210020403
7|F08210020403|FO||dup by F08210020403
8|F08210020403|FO||dup by F08210020403
因为2016-05-10 00:00:00是F08210020403的最大日期时间,所以所有keep列都将显示F08210020403的dup。我希望我可以设置一些关于keep是否包含“dup”的规则,忽略此行。之后 groupby保留行
这是我的输出:
1|F08210020403|GO|2014-05-17 00:00:00|dup by F08210020403
2|F08210020403|GO|2014-04-18 00:00:00|dup by F08210020403
3|F08210020403|FO||dup by F08210020403
4|F08210020403|FO||dup by F08210020403
5|F08210020403|FO|2016-09-18 00:00:00|dup by F08210020403
6|F08210020403|FO|2016-05-10 00:00:00|dup by F08210020403
7|F08210020403|FO||dup by F08210020403
8|F08210020403|FO||dup by F08210020403
预期产出:
1|F08210020403|GO|2014-05-17 00:00:00|yes
2|F08210020403|GO|2014-04-18 00:00:00|dup by F08210020403
3|F08210020403|FO||dup by F08210020403
4|F08210020403|FO||dup by F08210020403
5|F08210020403|FO|2016-09-18 00:00:00|dup by F08210020403
6|F08210020403|FO|2016-05-10 00:00:00|dup by F08210020403
7|F08210020403|FO||dup by F08210020403
8|F08210020403|FO||dup by F08210020403
任何帮助都将不胜感激
IIUC:
尝试:
相关问题 更多 >
编程相关推荐