我有这些数据。我需要这里的帮助,因为如果你看到时间戳有不连续性,我想用前一行填充它
整个数据集的时间间隔为30分钟,因此,如果您查看第3行和第4行,就会发现不连续性,正如您所看到的,在一个小时内会增加,然后在下一行中会增加2小时。因此,我想通过将当前时间戳更改为timestamp+30,在这里用前面的行值填充缺少的行
输入数据:
预期成果:
我尝试过这个代码,但结果完全不同
final_ds = []
for i in dataset.eqmt_id:
for j in dataset.brand_brew_no:
#Filter
data1 = dataset[(dataset['eqmt_id'] == i) & (dataset['brand_brew_no'] == j)]
min_date = data1.Timestamp.min()
max_date = data1.Timestamp.max()
for ind, k in data1.iterrows():
#If first row append as it is
if (k['Timestamp'] == min_date):
final_ds.append(k)
#If last row just pass
elif(k['Timestamp'] == max_date):
print('b')
pass;
#If next row timestamp not matching continuity, create rows
elif(k['Timestamp'] != date_thirty):
z = k
z.Timestamp = date_thirty
print(z)
final_ds.append(z)
#If mathching continuity append directly
elif(k['Timestamp'] == date_thirty):
final_ds.append(k)
#Increasing 30 min time at every run
date_thirty = k['Timestamp']+timedelta(minutes=30)
编辑1: #dfz是主df
appended_data = []
for i in df.eqmt_id:
for j in df.brand_brew_no:
df = dfz[(dfz['eqmt_id'] == i) & (dfz['brand_brew_no'] == j)]
df.set_index(pd.to_datetime(df['Timestamp']), inplace=True)
df2 = df.reindex(
pd.date_range(df.index.min(), df.index.max(), freq='30min')
).fillna(method='ffill')
temp = df2.reset_index()
appended_data.append(temp)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-44-a1a0fbad32b2> in <module>
4
5 df2.set_index(pd.to_datetime(df2['Timestamp']), inplace=True)
----> 6 df2 = pd.DataFrame(df2.reindex(pd.date_range(df2.Timestamp.min(), df2.Timestamp.max(), freq='30min')).fillna(method='ffill'))
7 temp = df2.reset_index()
Error : ValueError: cannot reindex from a duplicate axis
目前没有回答
相关问题 更多 >
编程相关推荐