基于tim将一个数据帧的条目关联到第二个数据帧

2024-10-02 22:31:41 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个数据帧。其中一个包含我通常的度量(时间索引)。来自不同源的第二个帧包含系统状态。它也是时间索引的,但是状态数据帧中的时间与我的数据帧中的时间与度量值不匹配。我想实现的是,现在度量数据帧中的每一行还包含度量时间之前状态数据帧中出现的最后一个状态。你知道吗

例如,我有这样一个状态框架:

                                          state
time                                           
2013-02-14 12:29:37.101000          SystemReset
2013-02-14 12:29:39.103000             WaitFace
2013-02-14 12:29:39.103000      NormalExecution
2013-02-14 12:29:39.166000        GreetVisitors
2013-02-14 12:29:46.879000  AskForParticipation
2013-02-14 12:29:56.807000  IntroduceVernissage
2013-02-14 12:30:07.275000      PictureQuestion

我的测量结果是这样的:

                            utime
time
2013-02-14 12:29:38.697038      0
2013-02-14 12:29:38.710432      1
2013-02-14 12:29:39.106475      2
2013-02-14 12:29:39.200701      3
2013-02-14 12:29:40.197014      0
2013-02-14 12:29:42.217976      5
2013-02-14 12:29:57.460601      7

我想以这样一个数据帧结束:

                            utime                 state
time
2013-02-14 12:29:38.697038      0           SystemReset
2013-02-14 12:29:38.710432      1           SystemReset
2013-02-14 12:29:39.106475      2       NormalExecution
2013-02-14 12:29:39.200701      3         GreetVisitors
2013-02-14 12:29:40.197014      0         GreetVisitors
2013-02-14 12:29:42.217976      5         GreetVisitors
2013-02-14 12:29:57.460601      7   Introducevernissage

我发现了这样一个效率很低的解决方案:

result = measurements.copy()
stateList = []
for timestamp, _ in measurements.iterrows():
    candidateStates = states.truncate(after=timestamp).tail(1)
    if len(candidateStates) > 0:
        stateList.append(candidateStates['state'].values[0])
    else:
        stateList.append("unknown")

result['state'] = stateList

你有什么办法优化这个吗?你知道吗


Tags: 数据time度量状态时间resulttimestampstate
1条回答
网友
1楼 · 发布于 2024-10-02 22:31:41

可能是这样的

df = df1.join(df2, how='outer')
df['state'].fillna(method='ffill',inplace=True)
df.dropna()

会有用吗?join产生:

>>> df
                                          state  utime
time                                                  
2013-02-14 12:29:37.101000          SystemReset    NaN
2013-02-14 12:29:38.697038                  NaN      0
2013-02-14 12:29:38.710432                  NaN      1
2013-02-14 12:29:39.103000             WaitFace    NaN
2013-02-14 12:29:39.103000      NormalExecution    NaN
2013-02-14 12:29:39.106475                  NaN      2
2013-02-14 12:29:39.166000        GreetVisitors    NaN
2013-02-14 12:29:39.200701                  NaN      3
2013-02-14 12:29:40.197014                  NaN      0
2013-02-14 12:29:42.217976                  NaN      5
2013-02-14 12:29:46.879000  AskForParticipation    NaN
2013-02-14 12:29:56.807000  IntroduceVernissage    NaN
2013-02-14 12:29:57.460601                  NaN      7
2013-02-14 12:30:07.275000      PictureQuestion    NaN

然后我们可以向前填充state列:

>>> df['state'].fillna(method='ffill',inplace=True)
time
2013-02-14 12:29:37.101000            SystemReset
2013-02-14 12:29:38.697038            SystemReset
2013-02-14 12:29:38.710432            SystemReset
2013-02-14 12:29:39.103000               WaitFace
2013-02-14 12:29:39.103000        NormalExecution
2013-02-14 12:29:39.106475        NormalExecution
2013-02-14 12:29:39.166000          GreetVisitors
2013-02-14 12:29:39.200701          GreetVisitors
2013-02-14 12:29:40.197014          GreetVisitors
2013-02-14 12:29:42.217976          GreetVisitors
2013-02-14 12:29:46.879000    AskForParticipation
2013-02-14 12:29:56.807000    IntroduceVernissage
2013-02-14 12:29:57.460601    IntroduceVernissage
2013-02-14 12:30:07.275000        PictureQuestion
Name: state

然后删除没有utime的行:

>>> df.dropna()
                                          state  utime
time                                                  
2013-02-14 12:29:38.697038          SystemReset      0
2013-02-14 12:29:38.710432          SystemReset      1
2013-02-14 12:29:39.106475      NormalExecution      2
2013-02-14 12:29:39.200701        GreetVisitors      3
2013-02-14 12:29:40.197014        GreetVisitors      0
2013-02-14 12:29:42.217976        GreetVisitors      5
2013-02-14 12:29:57.460601  IntroduceVernissage      7

您可能需要对其进行调整,以处理同时具有utime和(可能的多个)状态的情况。可能drop_duplicatestake_last=True会这样做。你还得比我早上喝咖啡前更认真地思考一下<<=的问题。你知道吗

相关问题 更多 >