如何过滤时间戳非常接近的数据点

3条回答

网友

1楼 · 编辑于 2024-05-20 04:08:11

可能有一种更简单的方法来实现这一点，但我相信，考虑到您的条件，这是可行的：

# reset index to make our life easier
df = df.reset_index()

# get rows with small timediffs
small_timediffs = df.loc[df.Timediff <= 0.1]
# get rows immediately above rows with small timediffs
prev_small_timediffs = df.loc[small_timediffs.index-1]

# helper function to allow comparison of timestamps without milisseconds
remove_milisseconds = lambda x: x.split('.')[0]

# compare if timestamps from rows with small timediffs 
# and rows above them are the same 
stamps = small_timediffs.timestamp.apply(remove_milisseconds).values
prev_stamps = prev_small_timediffs.timestamp.apply(remove_milisseconds).values
cond = (stamps == prev_stamps)

# extract results based on both conditions
# (small timediff and same timestamp)
result = df.loc[~df.index.isin(small_timediffs.loc[cond].index-1)]

网友

2楼 · 编辑于 2024-05-20 04:08:11

我已将一些值添加到您的数据帧中，以接收值为0.100的记录的相同时、分和秒

df = pd.DataFrame({'data':['16:23:21.323 19.5','16:23:21.423 20.0','04:27:04.550 19.5',
                               '04:27:11.670 19.5','06:12:24.523 19.0','06:12:24.623 18.5',
                               '16:23:21.323 19.5','06:12:24.523 19.0'],
                       'timestamp':['2012-03-21','2012-03-21','2012-03-22','2012-03-22','2012-03-23','2012-03-23',
                                    '2012-03-23','2012-03-23'],
                       'timediff':[28732.920,0.100,43423.127,7.120,0.100,0.100,92712.853,0.100]})

df['h_m_s'] = df['data'].str.extract(r'(.*) ')

                data   timediff   timestamp         h_m_s
0  16:23:21.323 19.5  28732.920  2012-03-21  16:23:21.323
1  16:23:21.423 20.0      0.100  2012-03-21  16:23:21.423
2  04:27:04.550 19.5  43423.127  2012-03-22  04:27:04.550
3  04:27:11.670 19.5      7.120  2012-03-22  04:27:11.670
4  06:12:24.523 19.0      0.100  2012-03-23  06:12:24.523
5  06:12:24.623 18.5      0.100  2012-03-23  06:12:24.623
6  16:23:21.323 19.5  92712.853  2012-03-23  16:23:21.323
7  06:12:24.523 19.0      0.100  2012-03-23  06:12:24.523

df_same = df[(df.timediff==0.100)]
df = df[~(df.timediff==0.100)]
df_same = df_same.drop_duplicates(subset=['h_m_s'],keep='first')
df=pd.concat([df,df_same])
print(df)
                data   timediff   timestamp         h_m_s
0  16:23:21.323 19.5  28732.920  2012-03-21  16:23:21.323
2  04:27:04.550 19.5  43423.127  2012-03-22  04:27:04.550
3  04:27:11.670 19.5      7.120  2012-03-22  04:27:11.670
6  16:23:21.323 19.5  92712.853  2012-03-23  16:23:21.323
1  16:23:21.423 20.0      0.100  2012-03-21  16:23:21.423
4  06:12:24.523 19.0      0.100  2012-03-23  06:12:24.523
5  06:12:24.623 18.5      0.100  2012-03-23  06:12:24.623

网友

3楼 · 编辑于 2024-05-20 04:08:11

Idea是按^{}按每秒的楼层日期时间分组，然后删除前一行（如果存在至少一个匹配的掩码m1）：

#convert to datetimes
df['timestamp'] = pd.to_datetime(df['timestamp'])

#test 0.1
m1 = df['Timediff'].eq(0.1)
#test previous values of m1
m2 = m1.shift(-1, fill_value=False)
#test if in groups floored by seconds is at least one True in m1
m3 = m1.groupby([df['timestamp'].dt.floor('S')]).transform('any')
#filter chained m3 with m2 and invert mask
df = df[~(m3 & m2)]
print (df)
            Id               timestamp  data  sig  events1  Start  Peak  \
104514  104755 2012-03-21 16:23:21.423  20.0 -1.0      0.0    0.0   1.0   
104623  104864 2012-03-22 04:27:04.550  19.5  0.0      0.0    0.0   0.0   
104630  104871 2012-03-22 04:27:11.670  19.5 -1.0      0.0    0.0   1.0   
105148  105389 2012-03-23 06:12:24.623  18.5  1.0      1.0    0.0   0.0   

         Timediff  Datadiff  
104514      0.100       0.5  
104623  43423.127      -0.5  
104630      7.120       0.0  
105148      0.100      -0.5

编辑：用下一个选项回答表单注释：

m1 = df3['Timediff'].eq(0.100) & df3['Start'].ne(1) & df3['Start'].shift(1).ne(1)

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何过滤时间戳非常接近的数据点

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >