我正在尝试加速下面的代码,该代码为每列生成不同类型的列表列表。我最初创建了pandas dataframe,然后将其转换为list,但这似乎相当缓慢。我怎样才能更快地创建这个列表,比如说一个数量级?除一列外,所有列都是常量
import pandas as pd
import numpy as np
import time
import datetime
def overflow_check(x):
# in SQL code the column is decimal(13, 2)
p=13
s=3
max_limit = float("9"*(p-s) + "." + "9"*s)
#min_limit = 0.01 #float("0" + "." + "0"*(s-2) + '1')
#min_limit = 0.1
if np.logical_not(isinstance(x, np.ndarray)) or len(x) < 1:
raise Exception("Non-numeric or empty array.")
else:
#print(x)
return x * (np.abs(x) < max_limit) + np.sign(x)* max_limit * (np.abs(x) >= max_limit)
def list_creation(y_forc):
backcast_length = len(y_forc)
backcast = pd.DataFrame(data=np.full(backcast_length, 2),
columns=['TypeId'])
backcast['id2'] = None
backcast['Daily'] = 1
backcast['ForecastDate'] = y_forc.index.strftime('%Y-%m-%d')
backcast['ReportDate'] = pd.to_datetime('today').strftime('%Y-%m-%d')
backcast['ForecastMethodId'] = 1
backcast['ForecastVolume'] = overflow_check(y_forc.values)
backcast['CreatedBy'] = 'test'
backcast['CreatedDt'] = pd.to_datetime('today')
return backcast.values.tolist()
i=pd.date_range('05-01-2010', '21-05-2018', freq='D')
x=pd.DataFrame(index=i, data = np.random.randint(0, 100, len(i)))
t=time.perf_counter()
y =list_creation(x)
print(time.perf_counter()-t)
这应该快一点,它只是直接创建列表:
编辑:速度慢的一个大问题是从datetime转换为指定格式所需的时间。如果我们可以通过以下措辞来消除这种情况:
那么现在这要快得多:
相关问题 更多 >
编程相关推荐