我试图用python复制Ernie Chan在其开创性著作Algorithmic Trading(第55页)中概述的示例2.7。网上没有太多相关的资料,但是statsmodel库非常有用。但是,我的代码生成的特征向量看起来不正确,因为这些值与测试数据不正确相关。以下是几个步骤中的代码:
import pandas as pd
import yfinance as yf
from datetime import datetime
from dateutil.relativedelta import relativedelta
years = 5
today = datetime.today().strftime('%Y-%m-%d')
lastyeartoday = (datetime.today() - relativedelta(years=years)).strftime('%Y-%m-%d')
symbols = ['BTC-USD', 'BCH-USD','ETH-USD']
df = yf.download(symbols,
start=lastyeartoday,
end=today,
progress=False)
df = df.dropna()
data = pd.DataFrame()
for symbol in symbols:
data[symbol] = df['Close'][symbol]
data.tail()
这将产生以下输出:
让我们来描绘三个系列:
# Plot the prices series
import matplotlib.pyplot as plt
%matplotlib inline
for symbol in symbols:
data[symbol].plot(figsize=(10,8))
plt.show()
图表:
现在,我们在数据集上运行协整Johansen测试:
import numpy as np
import pandas as pd
import statsmodels.api as sm
# data = pd.read_csv("http://web.pdx.edu/~crkl/ceR/data/usyc87.txt",index_col='YEAR',sep='\s+',nrows=66)
# y = data['Y']
# c = data['C']
from statsmodels.tsa.vector_ar.vecm import coint_johansen
"""
Johansen cointegration test of the cointegration rank of a VECM
Parameters
----------
endog : array_like (nobs_tot x neqs)
Data to test
det_order : int
* -1 - no deterministic terms - model1
* 0 - constant term - model3
* 1 - linear trend
k_ar_diff : int, nonnegative
Number of lagged differences in the model.
Returns
-------
result: Holder
An object containing the results which can be accessed using dot-notation. The object’s attributes are
eig: (neqs) - Eigenvalues.
evec: (neqs x neqs) - Eigenvectors.
lr1: (neqs) - Trace statistic.
lr2: (neqs) - Maximum eigenvalue statistic.
cvt: (neqs x 3) - Critical values (90%, 95%, 99%) for trace statistic.
cvm: (neqs x 3) - Critical values (90%, 95%, 99%) for maximum eigenvalue statistic.
method: str “johansen”
r0t: (nobs x neqs) - Residuals for Δ𝑌.
rkt: (nobs x neqs) - Residuals for 𝑌−1.
ind: (neqs) - Order of eigenvalues.
"""
def joh_output(res):
output = pd.DataFrame([res.lr2,res.lr1],
index=['max_eig_stat',"trace_stat"])
print(output.T,'\n')
print("Critical values(90%, 95%, 99%) of max_eig_stat\n",res.cvm,'\n')
print("Critical values(90%, 95%, 99%) of trace_stat\n",res.cvt,'\n')
# model with constant/trend (deterministic) term with lags set to 1
joh_model = coint_johansen(data,0,1) # k_ar_diff +1 = K
joh_output(joh_model)
由于测试值远大于临界值,我们可以排除无效假设,并宣布三个crpto对之间存在非常高的协整性
现在让我们打印特征值:
阵列([0.02903038,0.01993949,0.00584357])
我们的特征向量的第一行应该被认为是最强的,因为它的平均回复半衰期最短:
print('Eigenvector in scientific notation:\n{0}\n'.format(joh_model.evec[0]))
print('Eigenvector in decimal notation:')
i = 0
for val in joh_model.evec[0]:
print('{0}: {1:.10f}'.format(i, val))
i += 1
结果:
科学记数法中的特征向量: [2.21531848e-04-1.70103937e-04-9.403745E-05]
十进制表示法中的特征向量: 0: 0.0002215318 1: -0.0001701039 2:-0.0000940375
这是我在介绍中提到的问题。根据厄尼的描述,这些值应与每个交叉点的对冲比率相关。然而,它们是a)小b)其中两个是负的(对于这三个密码对来说显然是不正确的),并且c)似乎与测试数据完全不相关(例如,BTC显然是以巨大溢价交易的,应该是最小值)
现在我不是数学天才,我很有可能在某个地方搞砸了,这就是为什么我提供了复制所涉及的所有代码/步骤。任何指点和见解都将不胜感激。非常感谢
更新:根据MilTom的建议,我将数据集转换为收益率百分比,结果如下:
max_eig_stat trace_stat
0 127.076209 133.963475
1 6.581045 6.887266
2 0.306221 0.306221
Critical values(90%, 95%, 99%) of max_eig_stat
[[18.8928 21.1314 25.865 ]
[12.2971 14.2639 18.52 ]
[ 2.7055 3.8415 6.6349]]
Critical values(90%, 95%, 99%) of trace_stat
[[27.0669 29.7961 35.4628]
[13.4294 15.4943 19.9349]
[ 2.7055 3.8415 6.6349]]
Eigenvector in scientific notation:
[ 0.00400041 -0.01952632 -0.0133122 ]
Eigenvector in decimal notation:
0: 0.0040004070
1: -0.0195263209
2: -0.0133122020
这看起来更合适,但鉴于第1行和第2行的值较低,Johansen测试似乎无法排除空场景。显然没有相关性,至少我是这么看结果的
目前没有回答
相关问题 更多 >
编程相关推荐