当您知道数据在集合中的位置时,python最快的集合类型是什么?

2024-09-30 04:36:32 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要遍历数据大小和结构相同的数据集(每个10000条记录),并根据用户输入从数据集中的不同点提取数据。然后,我在数据集中对这些数据进行计算,并继续循环。计算是复杂的,涉及多个数据集(同时循环通过所有数据集),鉴于用户输入,结合我所在行的数据,我知道我需要从数据集中的何处获取数据,以帮助计算

x = list[4][3]

目前,我正在迭代数据帧,如果行中的信息与用户条件匹配,我将遍历并执行存储在类列表中的计算。这些计算涉及使用iloc从数据帧中的已知点提取数据

我在df.index中使用iterrows、itertuples和I测试了数据帧的迭代循环,发现df.index最快

我可以将所有集合切换到多维列表(甚至是类列表),因为不需要搜索任何集合,因为我知道集合中的确切点,数据是通过计算得到的。我相信在这一点上,我已经尽可能多地对数据进行了矢量化

目前,根据用户的输入,这可能需要一个多小时,所以我需要在我可以优化的地方进行优化

在我这么做之前(这是一项任务),我想知道在python中循环并从中提取已知数据点的最快的集合是什么?它可以是不变的。我倾向于多维列表。我错了吗

编辑:

以下是一个主要的循环函数:

def runOnce(self,calcIterations):
        try:
            #create dataframe and dictionary
            columnNames = ["Decision Date"]
            for stock in GU.stockList:
                columnNames.append(stock.name + " price")
                columnNames.append(stock.name + " % from MA")
            columnNames.append("Trade Date")
            columnNames.append("TDate Value")
            columnNames.append("MOVEMENT")
            outdf = pd.DataFrame(columns = columnNames)

            #loop over each day
            totalDays = len(GU.stockList[0].df)
            monthCount = 0
            createData = True
            if len(calcIterations) > 1:
                #do not create daily data if doing more than one run
                createData = False

            for i in GU.stockList[0].df.index:

                #check if all dates on that day are equal
                iDate = GU.stockList[0].df.iloc[i]['Date']

                #dates are equal
                #check if next date is a new month as calculations only happen on that day
                if(i+1 == totalDays):
                    break
                if(iDate.month != GU.stockList[0].df.iloc[i-1]['Date'].month):
                    #new month
                    monthCount += 1
                    #loop through the iterations doing the calculations
                    for ci in calcIterations:
                        if(monthCount % int(ci.cycle) != 0):
                            #not right month cycle, no calculations
                            continue
                        if ci.trade+i >= totalDays:
                            #not enough data for this ci
                            continue
                        #time for calculations
                        #do percent difference
                        lStock = GU.stockList[0]
                        lStock.percentFromMA = 10000
                        outRow = [str(lStock.df.iloc[i+ci.decision]['Date'].date())] #Decision date
                        for stock in GU.stockList:
                            c = stock.df.iloc[i+ci.decision]['Close']
                            outRow.append(c) #close value
                            ma = Decimal(stock.df.iloc[i+ci.decision]['MA' + str(ci.ma)])
                            #calculate percent difference
                            stock.percentFromMA = (c / ma - 1) * 100

                            outRow.append(stock.percentFromMA) #percentfrom MA
                            #grab lowest stock
                            if stock.percentFromMA < lStock.percentFromMA:
                                lStock = stock
                        #percent difference is complete, do the sell and buy   '{0:.2f}'.format(pi)
                        outRow.append(str(lStock.df.iloc[i+ci.trade]['Date'].date())) #trade date
                        buySharePrice = (lStock.df.iloc[i+ci.trade]['High'] + lStock.df.iloc[i+ci.trade]['Low']) / Decimal(2)
                        comments = ""
                        sStock = lStock
                        if lStock.name == ci.stockName:
                            #same stock do nothing
                            outRow.append(ci.shareNum * buySharePrice + ci.residule)
                            outRow.append("Same stock no change")
                            if len(calcIterations) == 1:
                                #only 1 run so do csv dataframe
                                outdf = outdf.append(pd.Series(outRow,index=outdf.columns),ignore_index=True)
                            continue
                        if ci.shareNum != 0:
                            #sell before buying
                            #get current stock and trading price
                            for stock in GU.stockList:
                                if ci.stockName == stock.name:
                                    sStock = stock
                            sellSharePrice = (sStock.df.iloc[i+ci.trade]['High'] + sStock.df.iloc[i+ci.trade]['Low']) / Decimal(2)
                            ci.currentAmount = sellSharePrice * ci.shareNum + ci.residule
                            ci.currentAmount = ci.currentAmount - ci.fees
                            ci.feesPaid = ci.feesPaid + ci.fees
                            comments += "Sold " + sStock.name + ":" + str(ci.shareNum) + " shares at $" + str(sellSharePrice) + " and $" + str(ci.residule) + " residue : "
                        #buy stock at trade price
                        ci.currentAmount = ci.currentAmount - ci.fees
                        ci.feesPaid = ci.feesPaid + ci.fees
                        ci.shareNum = int(ci.currentAmount / buySharePrice)
                        ci.residule = ci.currentAmount - (ci.shareNum * buySharePrice)
                        ci.stockName = lStock.name
                        outRow.append(ci.currentAmount) #current worth on trade day
                        comments += "Purchased " + lStock.name + ":" + str(ci.shareNum) + " shares at $" + str(buySharePrice) + " and $" + str(ci.residule) + " residue."
                        outRow.append(comments) #movement
                        if len(calcIterations) == 1:
                            #only 1 run so do csv dataframe
                            outdf = outdf.append(pd.Series(outRow,index=outdf.columns),ignore_index=True)
            if len(calcIterations) == 1:
                #print the csv
                outdf.to_csv("OneRun.csv", index=False)
        except Exception:
            raise

编辑2: 此外,如果它有帮助,这里是一个短期运行的功能配置文件

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   320477   53.627    0.000  134.463    0.000 managers.py:878(fast_xs)
 33971951   22.461    0.000   28.917    0.000 {built-in method builtins.isinstance}
   640954   16.092    0.000   20.291    0.000 numerictypes.py:578(_can_coerce_all)
  3204912    7.762    0.000   10.305    0.000 common.py:1886(_is_dtype_type)
320482/320481    7.536    0.000  110.313    0.000 series.py:197(__init__)
  2884386    7.115    0.000   12.014    0.000 common.py:1743(is_extension_array_dtype)
   320496    6.162    0.000   61.699    0.000 construction.py:630(sanitize_array)
  1281919    5.521    0.000   15.087    0.000 common.py:1619(is_bool_dtype)
        1    5.339    5.339  291.905  291.905 FTATool.py:107(runOnce)
   320477    5.298    0.000  254.900    0.001 frame.py:2916(_ixs)
   320496    5.190    0.000   42.226    0.000 construction.py:759(_try_cast)
12837927/10915015    5.135    0.000    6.070    0.000 {built-in method builtins.len}

谢谢


Tags: 数据pycidfindexifstocktrade

热门问题