查找数据帧列之间最近的时间戳

import numpy as np import pandas as pd test1 = pd.date_range(start='1/1/2018', end='1/10/2018') test1 = pd.DataFrame(test1) test1.rename(columns = {list(test1)[0]: 'time'}, inplace = True) test2 = pd.date_range(start='1/5/2018', end='1/20/2018') test2 = pd.DataFrame(test2) test2.rename(columns = {list(test2)[0]: 'time'}, inplace = True)

def nearest(items, pivot): return min(items, key=lambda x: abs(x - pivot)) for k in range(10): a = nearest(test2['time'], test1['time'][k]) ### find nearest timestamp from second dataframe b = test2.index[test2['time'] == a].tolist()[0] ### identify the index of this timestamp test1['value'][k] = b ### assign this value to the cell

1条回答

网友

1楼 · 发布于 2024-06-28 20:38:20

您可以使用numpy的argmin在一行中完成这项工作：

test1['values'] = test1['time'].apply(lambda t: np.argmin(np.absolute(test2['time'] - t)))

请注意，应用lambda函数本质上也是一个循环。从性能方面检查这是否满足您的要求

您还可以利用这样一个事实，即您的时间戳是经过排序的，并且每个时间戳之间的时间差是恒定的（如果我没有弄错的话）。以天为单位计算偏移量并导出指数向量，例如：

offset = (test1['time'] - test2['time']).iloc[0].days
if offset < 0: # test1 time starts before test2 time, prepend zeros:
    offset = abs(offset)
    idx = np.append(np.zeros(offset), np.arange(len(test1['time'])-offset)).astype(int)
else: # test1 time starts after or with test2 time, use arange right away:
    idx = np.arange(offset, offset+len(test1['time']))
    
test1['values'] = idx

相关问题更多 >

编程相关推荐

热门问题

热门文章