我有这个数据框“dfSummary”——
exchangeBalances = [['ETHBTC','binance',10], ['LTCBTC','binance',10], ['XRPBTC','binance',10], ['ETHBTC','bitfinex',10], ['LTCBTC','bitfinex',10], ['XRPBTC','bitfinex',10]]
bidOffers = [
['ETHBTC','binance', 0.0035, 0.0351, datetime(2018, 9, 1, 8, 15)], ['LTCBTC','binance',0.009,0.092, datetime(2018, 9, 1, 8, 15)], ['XRPBTC','binance',0.000077, 0.000078, datetime(2018, 9, 1, 8, 15)], ['ETHBTC','bitfinex', 0.003522, 0.0353, datetime(2018, 9, 1, 8, 15)], ['LTCBTC','bitfinex',0.0093,0.095, datetime(2018, 9, 1, 8, 15)], ['XRPBTC','bitfinex',0.000083, 0.000085, datetime(2018, 9, 1, 8, 15)],
['ETHBTC','binance', 0.0035, 0.0351, datetime(2018, 9, 1, 8, 30)], ['LTCBTC','binance',0.009,0.092, datetime(2018, 9, 1, 8, 30)], ['XRPBTC','binance',0.000077, 0.000078, datetime(2018, 9, 1, 8, 30)], ['ETHBTC','bitfinex', 0.003522, 0.0353, datetime(2018, 9, 1, 8, 30)], ['LTCBTC','bitfinex',0.0093,0.095, datetime(2018, 9, 1, 8, 30)], ['XRPBTC','bitfinex',0.000083, 0.000085, datetime(2018, 9, 1, 8, 30)],
['ETHBTC','binance', 0.0035, 0.0351, datetime(2018, 9, 1, 8, 45)], ['LTCBTC','binance',0.009,0.092, datetime(2018, 9, 1, 8, 45)], ['XRPBTC','binance',0.000077, 0.000078, datetime(2018, 9, 1, 8, 45)], ['ETHBTC','bitfinex', 0.003522, 0.0353, datetime(2018, 9, 1, 8, 45)], ['LTCBTC','bitfinex',0.0093,0.095, datetime(2018, 9, 1, 8, 45)], ['XRPBTC','bitfinex',0.000083, 0.000085, datetime(2018, 9, 1, 8, 45)]
]
dfExchangeBalances = pd.DataFrame(exchangeBalances, columns=['symbol','exchange','balance'])
dfBidOffers = pd.DataFrame(bidOffers, columns=['symbol','exchange','bid', 'offer', 'created'])
dfBidOffers["spread"] = dfBidOffers["bid"] - dfBidOffers["offer"]
dfSummary = dfExchangeBalances.merge(dfBidOffers, how='left', on=['symbol','exchange'])
我需要完成的是,在“dfSummary”中添加一个计算字段:
currentRow["Spread"] - someOtherRow["Spread"]
“someOtherRow”是基于“已创建”(例如,最后一行具有相同的{symbol,exchange}但在30分钟前“已创建”(与“currentRow”相比)的查找
澄清:上面的例子是对手头实际问题的简化。时间间隔不完全是15分钟。事实上,我需要在DataFrame中查找相应的记录(相同的键={symbol,exchange}),但第一个这样的记录是在第一个月、第一个季度和第一年创建的
我试图避免在DataFrame.iter上手动循环,而是使用Pandas内置查找(矢量化)
我在考虑数据帧。查找Vectorized look-up of values in Pandas dataframe 但不确定如何从计算字段的上下文中使用它…?同样,我希望对相同的数据帧进行查找,而不是对不同的数据帧进行查找
矢量化(熊猫和Numpy-vs循环):
我明白了,这是我的真实代码(所以我不会发布所有内容)。这将起作用(但不确定是否以最快的方式实施)
我使用的是数据帧.apply。这不是矢量化的方式,但应该比python中的循环快得多。有人能告诉我们如何用矢量化的方式重写下面的内容吗
参考本文-https://engineering.upside.com/a-beginners-guide-to-optimizing-pandas-code-for-speed-c09ef2c6a4d6
。。。我无法用矢量化的方式来重写,鉴于查找的性质,我开始觉得下面的内容不能矢量化(如果你们中有人能证明我错了,我很高兴):
查找功能包括:
矢量化!!!!!!!!(嗯……大部分情况下)
想法是,使用“合并”(自连接)作为“DataFrame.查找”,它适用于完全不同的应用程序,例如:Pandas DataFrame.lookup
从原始修复扩展
步骤1)ProfitLoss.py\用于预测TM1、月开始、季度开始、年开始,因为无论如何都要调用它
步骤2)合并(即自连接),而不是数据帧。应用或数据帧。查找:
实际上,我不确定合并/自连接是否比显式循环更有效。而且,我还没有弄清楚该怎么做Sharpe Ratio和MaxDrawdown!!熊猫的窗口功能似乎没有帮助
人?!谢谢
这是假设
created
有恒定的15分钟间隔。您可以groupby
符号和交换,并向下移动2(两个时段,因为每个时段为15分钟):输出:
相关问题 更多 >
编程相关推荐