我试图通过迭代唯一值(合同号)来添加从一个dataframe列获取的值。对于较小的迭代次数,脚本可以完美地工作。但是,如果对1000个唯一值进行迭代,则会在生成的数据帧中创建重复的值,这反过来会减慢处理速度,并占用不必要的处理时间。 我该如何提高效率?你知道吗
https://imgur.com/3obXPne-原始数据帧
https://imgur.com/mEA8g6Z-新数据帧中不必要的重复数据帧
https://imgur.com/3i5gMoJ-新数据帧中不必要的重复数据帧
import pandas as pd
import numpy as np
from datetime import datetime
df = pd.DataFrame([["AB1111",'2018-08-15 00:00:00','164','123','123'],
["AB1111",'2018-08-15 00:03:00','564','453','126'],
["AB1111",'2018-08-15 00:10:00','364','1231','1223'],
["AB1111",'2018-08-15 00:01:00','564','575','1523'],
["CD1111",'2018-08-16 00:12:00','514','341','1213'],
["CD1111",'2018-08-15 00:02:00','564','1234','123'],
["CD1111",'2018-08-16 00:05:00','564','341','124'],
["CD1111",'2018-08-16 00:03:00','64','341','123'],
["EF1111",'2018-08-15 00:00:00','534','341','121'],
["EF1111",'2018-08-17 00:01:00','564','341','163'],
["EF1111",'2018-08-15 00:09:00','524','341','129']],
columns = ['contract', 'datetime',
'real_cons','solar_gen','battery_charge'])
# converting datetime column datatype to "datetime"
df['datetime'] = pd.to_datetime(df['datetime'])
#aggregation dataframe (new dataframe)
df_agg1 = pd.DataFrame()
for contract in df['contract'].unique()[:1500]:
print(contract)
df_contract = df.copy()[df['contract']==contract] # selecting each full dataframe from the main DF
df_contract.set_index('datetime', inplace=True) # set "datetime" column as an index
df_contract.sort_index(inplace=True) # sort index
df_contract = df_contract.loc['2018-8-15'] # select timeframe
# creating GB61074_cons column, which will be added to df_agg, from df_contract 'real_cons' column
df_contract[f'{contract}_con'] = df_contract['real_cons']
if df_agg1.empty:
df_agg1 = df_contract[[f'{contract}_con']] # first column
else:
df_agg1 = df_agg1.join(df_contract[f'{contract}_con']) # subsequent columns
df_agg1
如何创建新的数据帧而不创建这些不必要的副本? 是什么导致了它们的产生?你知道吗
这是一种不使用
for
循环来获得完全相同结果的方法。为了便于阅读,我用了多行来添加解释。你知道吗结果:
相关问题 更多 >
编程相关推荐