我需要遍历数据大小和结构相同的数据集(每个10000条记录),并根据用户输入从数据集中的不同点提取数据。然后,我在数据集中对这些数据进行计算,并继续循环。计算是复杂的,涉及多个数据集(同时循环通过所有数据集),鉴于用户输入,结合我所在行的数据,我知道我需要从数据集中的何处获取数据,以帮助计算
x = list[4][3]
目前,我正在迭代数据帧,如果行中的信息与用户条件匹配,我将遍历并执行存储在类列表中的计算。这些计算涉及使用iloc从数据帧中的已知点提取数据
我在df.index中使用iterrows、itertuples和I测试了数据帧的迭代循环,发现df.index最快
我可以将所有集合切换到多维列表(甚至是类列表),因为不需要搜索任何集合,因为我知道集合中的确切点,数据是通过计算得到的。我相信在这一点上,我已经尽可能多地对数据进行了矢量化
目前,根据用户的输入,这可能需要一个多小时,所以我需要在我可以优化的地方进行优化
在我这么做之前(这是一项任务),我想知道在python中循环并从中提取已知数据点的最快的集合是什么?它可以是不变的。我倾向于多维列表。我错了吗
编辑:
以下是一个主要的循环函数:
def runOnce(self,calcIterations):
try:
#create dataframe and dictionary
columnNames = ["Decision Date"]
for stock in GU.stockList:
columnNames.append(stock.name + " price")
columnNames.append(stock.name + " % from MA")
columnNames.append("Trade Date")
columnNames.append("TDate Value")
columnNames.append("MOVEMENT")
outdf = pd.DataFrame(columns = columnNames)
#loop over each day
totalDays = len(GU.stockList[0].df)
monthCount = 0
createData = True
if len(calcIterations) > 1:
#do not create daily data if doing more than one run
createData = False
for i in GU.stockList[0].df.index:
#check if all dates on that day are equal
iDate = GU.stockList[0].df.iloc[i]['Date']
#dates are equal
#check if next date is a new month as calculations only happen on that day
if(i+1 == totalDays):
break
if(iDate.month != GU.stockList[0].df.iloc[i-1]['Date'].month):
#new month
monthCount += 1
#loop through the iterations doing the calculations
for ci in calcIterations:
if(monthCount % int(ci.cycle) != 0):
#not right month cycle, no calculations
continue
if ci.trade+i >= totalDays:
#not enough data for this ci
continue
#time for calculations
#do percent difference
lStock = GU.stockList[0]
lStock.percentFromMA = 10000
outRow = [str(lStock.df.iloc[i+ci.decision]['Date'].date())] #Decision date
for stock in GU.stockList:
c = stock.df.iloc[i+ci.decision]['Close']
outRow.append(c) #close value
ma = Decimal(stock.df.iloc[i+ci.decision]['MA' + str(ci.ma)])
#calculate percent difference
stock.percentFromMA = (c / ma - 1) * 100
outRow.append(stock.percentFromMA) #percentfrom MA
#grab lowest stock
if stock.percentFromMA < lStock.percentFromMA:
lStock = stock
#percent difference is complete, do the sell and buy '{0:.2f}'.format(pi)
outRow.append(str(lStock.df.iloc[i+ci.trade]['Date'].date())) #trade date
buySharePrice = (lStock.df.iloc[i+ci.trade]['High'] + lStock.df.iloc[i+ci.trade]['Low']) / Decimal(2)
comments = ""
sStock = lStock
if lStock.name == ci.stockName:
#same stock do nothing
outRow.append(ci.shareNum * buySharePrice + ci.residule)
outRow.append("Same stock no change")
if len(calcIterations) == 1:
#only 1 run so do csv dataframe
outdf = outdf.append(pd.Series(outRow,index=outdf.columns),ignore_index=True)
continue
if ci.shareNum != 0:
#sell before buying
#get current stock and trading price
for stock in GU.stockList:
if ci.stockName == stock.name:
sStock = stock
sellSharePrice = (sStock.df.iloc[i+ci.trade]['High'] + sStock.df.iloc[i+ci.trade]['Low']) / Decimal(2)
ci.currentAmount = sellSharePrice * ci.shareNum + ci.residule
ci.currentAmount = ci.currentAmount - ci.fees
ci.feesPaid = ci.feesPaid + ci.fees
comments += "Sold " + sStock.name + ":" + str(ci.shareNum) + " shares at $" + str(sellSharePrice) + " and $" + str(ci.residule) + " residue : "
#buy stock at trade price
ci.currentAmount = ci.currentAmount - ci.fees
ci.feesPaid = ci.feesPaid + ci.fees
ci.shareNum = int(ci.currentAmount / buySharePrice)
ci.residule = ci.currentAmount - (ci.shareNum * buySharePrice)
ci.stockName = lStock.name
outRow.append(ci.currentAmount) #current worth on trade day
comments += "Purchased " + lStock.name + ":" + str(ci.shareNum) + " shares at $" + str(buySharePrice) + " and $" + str(ci.residule) + " residue."
outRow.append(comments) #movement
if len(calcIterations) == 1:
#only 1 run so do csv dataframe
outdf = outdf.append(pd.Series(outRow,index=outdf.columns),ignore_index=True)
if len(calcIterations) == 1:
#print the csv
outdf.to_csv("OneRun.csv", index=False)
except Exception:
raise
编辑2: 此外,如果它有帮助,这里是一个短期运行的功能配置文件
ncalls tottime percall cumtime percall filename:lineno(function)
320477 53.627 0.000 134.463 0.000 managers.py:878(fast_xs)
33971951 22.461 0.000 28.917 0.000 {built-in method builtins.isinstance}
640954 16.092 0.000 20.291 0.000 numerictypes.py:578(_can_coerce_all)
3204912 7.762 0.000 10.305 0.000 common.py:1886(_is_dtype_type)
320482/320481 7.536 0.000 110.313 0.000 series.py:197(__init__)
2884386 7.115 0.000 12.014 0.000 common.py:1743(is_extension_array_dtype)
320496 6.162 0.000 61.699 0.000 construction.py:630(sanitize_array)
1281919 5.521 0.000 15.087 0.000 common.py:1619(is_bool_dtype)
1 5.339 5.339 291.905 291.905 FTATool.py:107(runOnce)
320477 5.298 0.000 254.900 0.001 frame.py:2916(_ixs)
320496 5.190 0.000 42.226 0.000 construction.py:759(_try_cast)
12837927/10915015 5.135 0.000 6.070 0.000 {built-in method builtins.len}
谢谢
目前没有回答
相关问题 更多 >
编程相关推荐