基于更复杂的条件删除pandas中的行

time id type <deleted because for id 1 the date is not the max value and the type differs from the type of the max date for id 1> 2013-11-02 1 xF1yz <deleted because for id 1 the date is not the max value and the type differs from the type of the max date for id 1> <deleted because for id 1 the date is not the max value and the type differs from the type of the max date for id 1> 2013-11-02 1 xF1yz 2006-07-07 5 F5spo 2006-07-06 5 F5spo //kept because although the date is not max, it has the same type as the row with the max date for id 5 <deleted because for id 5 the date is not the max value and the type differs from the type of the max date for id 5>

3条回答

网友

1楼 · 编辑于 2024-06-24 12:41:49

使用^{}获取最大值的索引，只过滤列id和type和^{}：

df = df.merge(df.loc[df.groupby('id')['time'].idxmax(), ['id','type']])
print (df)
        time  id   type
0 2013-11-02   1  xF1yz
1 2013-11-02   1  xF1yz
2 2006-07-07   5  F5spo
3 2006-07-06   5  F5spo

或将^{}与^{}一起使用：

df = df.merge(df.sort_values('time').drop_duplicates('id', keep='last')[["id", "type"]])

网友

2楼 · 编辑于 2024-06-24 12:41:49

使用^{}、^{}和^{}^{}创建助手Series。然后使用^{}：

# If neccessary cast to datetime dtype
# df['time'] = pd.to_datetime(df['time'])

s = df.set_index('type').groupby('id')['time'].transform('idxmax')
df[df.type == s.values]

[输出]

        time  id   type
1 2013-11-02   1  xF1yz
4 2013-11-02   1  xF1yz
5 2006-07-07   5  F5spo
6 2006-07-06   5  F5spo

网友

3楼 · 编辑于 2024-06-24 12:41:49

您可以按时间对数据帧排序，然后按id分组，并选择每个分组中的最后一行。这是日期最大的一行。你知道吗

last_rows = df.sort_values('time').groupby('id').last()

然后将原始数据帧与新数据帧合并：

result = df.merge(last_rows, on=["id", "type"])
#       time_x  id   type      time_y
#0  2013-11-02   1  xF1yz  2013-11-02
#1  2013-11-02   1  xF1yz  2013-11-02
#2  2006-07-07   5  F5spo  2006-07-07
#3  2006-07-06   5  F5spo  2006-07-07

如果需要，请删除最后一个重复列：

result.drop('time_y', axis=1, inplace=True)

相关问题更多 >

编程相关推荐

热门问题

热门文章