我有一个像这样的朋友
UNIT EXITSn_hourly Interval
1867 R081 104 00:00:00-04:00:00
1868 R081 0 04:00:00-04:00:00
1869 R081 129 04:00:00-08:00:00
1870 R081 521 08:00:00-12:00:00
1871 R081 1048 12:00:00-16:00:00
2838 R032 38 00:00:00-04:00:00
2839 R032 0 04:00:00-04:00:00
2840 R032 89 04:00:00-08:00:00
2841 R032 470 08:00:00-12:00:00
当Interval有这种特殊格式时,我需要删除整行
1868 R081 0 04:00:00-04:00:00
我不仅要删除04:00:00-04:00:00
,还要删除类似的值,比如
01:00:00-01:00:00
其实这是我原来的df。我创造了一个间隔
C/A UNIT SCP DATEn TIMEn DESCn ENTRIESn EXITSn
0 A002 R051 02-00-00 06-29-13 00:00:00 REGULAR 4174592 1433672
1 A002 R051 02-00-00 06-29-13 04:00:00 REGULAR 4174628 1433675
2 A002 R051 02-00-00 06-29-13 08:00:00 REGULAR 4174641 1433706
3 A002 R051 02-00-00 06-29-13 12:00:00 REGULAR 4174741 1433775
4 A002 R051 02-00-00 06-29-13 16:00:00 REGULAR 4174936 1433826
5 A002 R051 02-00-00 06-29-13 20:00:00 REGULAR 4175270 1433877
6 A002 R051 02-00-00 06-30-13 00:00:00 REGULAR 4175403 1433908
7 A002 R051 02-00-00 06-30-13 04:00:00 REGULAR 4175441 1433914
8 A002 R051 02-00-00 06-30-13 08:00:00 REGULAR 4175457 1433928
9 A002 R051 02-00-00 06-30-13 12:00:00 REGULAR 4175520 1433981
我用这个代码创建了interval
import copy
df = copy.deepcopy(turnstile_data)
pdf = df.shift(periods=1)
df['ENTRIESn_hourly'] = df['ENTRIESn'] - pdf['ENTRIESn'].fillna(0)
df['EXITSn_hourly'] = df['EXITSn'] - pdf['EXITSn'].fillna(0)
df['Interval'] = pdf['TIMEn']+'-'+ df['TIMEn'].fillna(0)
df.loc[(df['ENTRIESn'] == 0), 'ENTRIESn_hourly'] = 0
df.loc[(df['EXITSn'] == 0), 'EXITSn_hourly'] = 0
df.loc[(df['C/A'] != pdf['C/A']) | (df['UNIT'] != pdf['UNIT']) | (df['SCP'] != pdf['SCP']), ['ENTRIESn_hourly', 'EXITSn_hourly','Interval']] = 0
df = df[df.Interval != 0]
print df.head(20)
head7=copy.deepcopy(df)
required_df=head7[['UNIT','EXITSn_hourly','Interval']].groupby(head7.UNIT)
print required_df.head(5)
您可以比较字符串的各个部分,然后按子集删除它们:
编辑:
我检查您的代码,也许您可以省略} :
copy.deepcopy
并使用^{可能需要将间隔拆分为间隔\u开始和间隔\u结束,并检查它们是否相等:
相关问题 更多 >
编程相关推荐