我有一个数据框“dfSummary”-
exchangeBalances = [['ETHBTC','binance',10], ['LTCBTC','binance',10], ['XRPBTC','binance',10], ['ETHBTC','bitfinex',10], ['LTCBTC','bitfinex',10], ['XRPBTC','bitfinex',10]]
bidOffers = [
['ETHBTC','binance', 0.0035, 0.0351, datetime(2018, 9, 1, 8, 15)], ['LTCBTC','binance',0.009,0.092, datetime(2018, 9, 1, 8, 15)], ['XRPBTC','binance',0.000077, 0.000078, datetime(2018, 9, 1, 8, 15)], ['ETHBTC','bitfinex', 0.003522, 0.0353, datetime(2018, 9, 1, 8, 15)], ['LTCBTC','bitfinex',0.0093,0.095, datetime(2018, 9, 1, 8, 15)], ['XRPBTC','bitfinex',0.000083, 0.000085, datetime(2018, 9, 1, 8, 15)],
['ETHBTC','binance', 0.0035, 0.0351, datetime(2018, 9, 1, 8, 30)], ['LTCBTC','binance',0.009,0.092, datetime(2018, 9, 1, 8, 30)], ['XRPBTC','binance',0.000077, 0.000078, datetime(2018, 9, 1, 8, 30)], ['ETHBTC','bitfinex', 0.003522, 0.0353, datetime(2018, 9, 1, 8, 30)], ['LTCBTC','bitfinex',0.0093,0.095, datetime(2018, 9, 1, 8, 30)], ['XRPBTC','bitfinex',0.000083, 0.000085, datetime(2018, 9, 1, 8, 30)],
['ETHBTC','binance', 0.0035, 0.0351, datetime(2018, 9, 1, 8, 45)], ['LTCBTC','binance',0.009,0.092, datetime(2018, 9, 1, 8, 45)], ['XRPBTC','binance',0.000077, 0.000078, datetime(2018, 9, 1, 8, 45)], ['ETHBTC','bitfinex', 0.003522, 0.0353, datetime(2018, 9, 1, 8, 45)], ['LTCBTC','bitfinex',0.0093,0.095, datetime(2018, 9, 1, 8, 45)], ['XRPBTC','bitfinex',0.000083, 0.000085, datetime(2018, 9, 1, 8, 45)]
]
dfExchangeBalances = pd.DataFrame(exchangeBalances, columns=['symbol','exchange','balance'])
dfBidOffers = pd.DataFrame(bidOffers, columns=['symbol','exchange','bid', 'offer', 'created'])
dfBidOffers["spread"] = dfBidOffers["bid"] - dfBidOffers["offer"]
dfSummary = dfExchangeBalances.merge(dfBidOffers, how='left', on=['symbol','exchange'])
我需要完成的是,在“dfSummary”中添加一个计算字段:
^{pr2}$“someOtherRow”是基于“created”的查找(例如,最后一行具有相同的{symbol,exchange},但在30分钟前“created”(与“currentRow”相比)
澄清:上面的例子是手头实际问题的简化。间隔时间不是15分钟。事实上,我需要在DataFrame中查找相应的记录(同一个key={symbol,exchange}),但是第一个这样的记录是在第一个月、第一个季度和第一年创建的。在
我尽量避免手动循环数据帧.iter使用Pandas内置的查找(矢量化)
我在想DataFrame。查找Vectorized look-up of values in Pandas dataframe 但不知道如何从计算字段的上下文中使用它。。。?另外,我希望对相同的数据帧进行查找,而不是针对不同的数据帧进行查找
提前谢谢!在
矢量化(熊猫和Numpy-vs循环):
https://engineering.upside.com/a-beginners-guide-to-optimizing-pandas-code-for-speed-c09ef2c6a4d6
https://www.datascience.com/blog/straightening-loops-how-to-vectorize-data-aggregation-with-pandas-and-numpy/https://realpython.com/numpy-array-programming/
我明白了,这是我的真实代码(所以我不会把所有东西都贴出来)。这将起作用(但不确定是否以最快的方式实现)。在
我在用数据帧应用。这不是矢量化的方法,但应该比python中的循环快得多。有人能不能请你解释一下如何用完全矢量化的方式重写下面的内容?在
引用本文-https://engineering.upside.com/a-beginners-guide-to-optimizing-pandas-code-for-speed-c09ef2c6a4d6
。。。我无法用矢量化的方式重写,而且考虑到查找的性质,我开始觉得下面的内容不可能是矢量化的(如果你们中有人能证明我错了,我很高兴):
查找函数包括:
^{pr2}$这是假设
created
具有恒定的15分钟间隔。您可以groupby
符号和交换,并下移2(2个时段,因为每个时段是15分钟):输出:
^{pr2}$矢量化!!!!!!!!(嗯。。。大部分情况下)
想法是,使用“merge”(self-join),就像“DataFrame.lookup”一样,这是针对完全不同的应用程序,例如:Pandas DataFrame.lookup
从原始修复扩展。。。在
步骤1)利润损失.py\预先计算TM1、MonthStart、QuarterStart、yearlstart,因为无论如何都要调用它。在
第2步)合并(即自连接),而不是DataFrame。应用或DataFrame。查找:
^{pr2}$实际上,我不确定merge/self-join是否比显式循环更高效。另外,我还不知道该怎么做,尽管夏普比率和MaxDrawdown!!熊猫的窗口功能似乎没有帮助。。。在
人?!谢谢!!在
相关问题 更多 >
编程相关推荐