python中基于条件的删除日期

2024-04-20 01:07:51 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图实现一个条件,如果不正确值的计数大于2(以下示例中为2019-05-17和2019-05-20),则删除完整日期(所有时间段)

输入

                    t_value C/IC
2019-05-17 00:00:00   0     incorrect
2019-05-17 01:00:00   0     incorrect 
2019-05-17 02:00:00   0     incorrect 
2019-05-17 03:00:00   4     correct
2019-05-17 04:00:00   5     correct 
2019-05-18 01:00:00   0     incorrect   
2019-05-18 02:00:00   6     correct  
2019-05-18 03:00:00   7     correct 
2019-05-19 04:00:00   0     incorrect
2019-05-19 09:00:00   0    incorrect 
2019-05-19 11:00:00   8    correct
2019-05-20 07:00:00   2    correct
2019-05-20 08:00:00   0    incorrect
2019-05-20 09:00:00   0    incorrect
2019-05-20 07:00:00   0    incorrect 

期望输出

                    t_value C/IC 
2019-05-18 01:00:00   0     incorrect   
2019-05-18 02:00:00   6     correct  
2019-05-18 03:00:00   7     correct 
2019-05-19 04:00:00   0     incorrect
2019-05-19 09:00:00   0    incorrect 
2019-05-19 11:00:00   8    correct

我不确定要执行哪个基于时间的操作才能获得所需的结果。谢谢


Tags: 示例value时间条件计数时间段iccorrect
2条回答
#read in data
df = pd.read_csv(StringIO(data),sep='\s{2,}', engine='python')

#give index a name 
df.index.name = 'Date'
#convert to datetime 
#and sort index
#usually safer to sort datetime index in Pandas
df.index = pd.to_datetime(df.index)
df = df.sort_index()

res = (df
       #group by date and c/ic
       .groupby([pd.Grouper(freq='1D',level='Date'),"C/IC"])
       .size()
       #get rows greater than 2 and incorrect
       .loc[lambda x: x>2,"incorrect"]
       #keep only the date index
       .droplevel(-1)
       .index
       #datetime information trapped here
       #and due to grouping, it is different from initial datetime
       #as such, we convert to string 
       #and build another batch of dates
       .astype(str)
       .tolist()
      )

res
['2019-05-17', '2019-05-20']

#build a numpy array of dates
idx = np.array(res, dtype='datetime64')

#exclude dates in idx and get final value
#aim is to get dates, irrespective of time

df.loc[~np.isin(df.index.date,idx)]

                     t_value    C/IC
Date        
2019-05-18 01:00:00     0   incorrect
2019-05-18 02:00:00     6   correct
2019-05-18 03:00:00     7   correct
2019-05-19 04:00:00     0   incorrect
2019-05-19 09:00:00     0   incorrect
2019-05-19 11:00:00     8   correct

对不起,我误解了这个问题

更新的答案:您可以通过以下方式找到要删除的日期:

df['_date'] = df.index.dt.date
incorrect_df = df[df['C/IC'] == 'incorrect']
incorrect_count = incorrect_df['C/IC'].groupby(by='_date').count()
dates_to_remove = set(incorrect_count[incorrect_count > 2]['_date'])
    # using set to make the later step more efficient if the df is long

然后相应地屏蔽数据帧:

mask = [x not in dates_to_remove for x in df['_date']
res = df[mask]

相关问题 更多 >