如果不连续,有效地添加行

2024-09-26 22:08:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这些数据。我需要这里的帮助,因为如果你看到时间戳有不连续性,我想用前一行填充它

整个数据集的时间间隔为30分钟,因此,如果您查看第3行和第4行,就会发现不连续性,正如您所看到的,在一个小时内会增加,然后在下一行中会增加2小时。因此,我想通过将当前时间戳更改为timestamp+30,在这里用前面的行值填充缺少的行

输入数据:

^{tb1}$

预期成果:

^{tb2}$

我尝试过这个代码,但结果完全不同

final_ds = []
for i in dataset.eqmt_id:
for j in dataset.brand_brew_no:
    
    #Filter
    data1 = dataset[(dataset['eqmt_id'] == i) & (dataset['brand_brew_no'] == j)]
    
    min_date = data1.Timestamp.min()
    max_date = data1.Timestamp.max()

    for ind, k in data1.iterrows():
        #If first row append as it is
        if (k['Timestamp'] == min_date):
            final_ds.append(k)
            
        #If last row just pass
        elif(k['Timestamp'] == max_date):
            print('b')
            pass;
        
        #If next row timestamp not matching continuity, create rows
        elif(k['Timestamp'] != date_thirty):
            z = k
            z.Timestamp = date_thirty
            print(z)
            final_ds.append(z)
            
        #If mathching continuity append directly
        elif(k['Timestamp'] == date_thirty):
            final_ds.append(k)
        
        #Increasing 30 min time at every run 
        date_thirty = k['Timestamp']+timedelta(minutes=30)

编辑1: #dfz是主df

appended_data = []
for i in df.eqmt_id:
  for j in df.brand_brew_no:
    df  = dfz[(dfz['eqmt_id'] == i) & (dfz['brand_brew_no'] == j)]

    df.set_index(pd.to_datetime(df['Timestamp']), inplace=True)
    df2 = df.reindex(
        pd.date_range(df.index.min(), df.index.max(), freq='30min')
    ).fillna(method='ffill')
   
    temp = df2.reset_index()
    appended_data.append(temp)


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-44-a1a0fbad32b2> in <module>
      4 
      5         df2.set_index(pd.to_datetime(df2['Timestamp']), inplace=True)
----> 6         df2 = pd.DataFrame(df2.reindex(pd.date_range(df2.Timestamp.min(), df2.Timestamp.max(), freq='30min')).fillna(method='ffill'))
      7         temp = df2.reset_index()
    


Error : ValueError: cannot reindex from a duplicate axis

Tags: indffordateindexdsmindataset

热门问题