Pandas高级groupby和按日期筛选

2024-04-27 04:09:54 发布

您现在位置:Python中文网/ 问答频道 /正文

从输入创建输出数据帧,如何在每个id的target==1时第一次筛选行,或者在target为1的每个id中删除连续出现的字,但是在target=1之前将所有0保留在target中

输入

ID   date         target
a1   2019-11-01   0
a1   2019-12-01   0
a1   2020-01-01   1
a1   2020-02-01   1
a1   2020-03-01   0
a2   2019-11-01   0
a2   2019-12-01   1
a2   2020-03-01   0
a2   2020-04-01   1

输出

ID   date         target
a1   2019-11-01   0
a1   2019-12-01   0
a1   2020-01-01   1
a2   2019-11-01   0
a2   2019-12-01   1

Tags: 数据ida2targetdatea1
2条回答

您只能保留groupby中目标总和为<;=1,然后再次分组,并确保使用.ne删除1后的零

import pandas as pd
df = pd.DataFrame({'ID': ['a1', 'a1', 'a1', 'a1', 'a1', 'a2', 'a2', 'a2', 'a2'],
 'date': ['2019-11-01',
  '2019-12-01',
  '2020-01-01',
  '2020-02-01',
  '2020-03-01',
  '2019-11-01',
  '2019-12-01',
  '2020-03-01',
  '2020-04-01'],
 'target': [0, 0, 1, 1, 0, 0, 1, 0, 1]})


df = df.loc[df.groupby('ID')['target'].cumsum()<=1]
df = df.loc[df.groupby('ID')['target'].shift(1).ne(1)]

输出

    ID  date    target
0   a1  2019-11-01  0
1   a1  2019-12-01  0
2   a1  2020-01-01  1
5   a2  2019-11-01  0
6   a2  2019-12-01  1


from io import stringIO

data = StringIO("""
uid,  date,         target
a1,   2019-11-01,   0
a1,   2019-12-01,   0
a1,   2020-01-01,   1
a1,  2020-02-01,   1
a1,   2020-03-01,   0
a2,   2019-11-01,   0
a2,   2019-12-01,   1
a2,   2020-03-01,   0
a2,   2020-04-01,   1
"""
)

df = pd.read_csv(data).rename(columns=lambda x: x.strip())

def filter_in_group(df: pd.DataFrame):
  ind = np.argmax(df.target)
  return df.loc[:, ['date', 'target']].iloc[:ind+1]

df_filtered = (
df
.groupby('uid')
.apply(lambda x: filter_in_group(x))
.reset_index()
.drop('level_1', axis=1)
)




相关问题 更多 >