Johansen测试产生了不正确的特征向量

2024-06-26 13:43:51 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图用python复制Ernie Chan在其开创性著作Algorithmic Trading(第55页)中概述的示例2.7。网上没有太多相关的资料,但是statsmodel库非常有用。但是,我的代码生成的特征向量看起来不正确,因为这些值与测试数据不正确相关。以下是几个步骤中的代码:

import pandas as pd
import yfinance as yf
from datetime import datetime
from dateutil.relativedelta import relativedelta

years = 5
today = datetime.today().strftime('%Y-%m-%d')
lastyeartoday = (datetime.today() - relativedelta(years=years)).strftime('%Y-%m-%d')
symbols = ['BTC-USD', 'BCH-USD','ETH-USD']

df = yf.download(symbols, 
                      start=lastyeartoday, 
                      end=today, 
                      progress=False)
df = df.dropna()
data = pd.DataFrame()
for symbol in symbols:
    data[symbol] = df['Close'][symbol]

data.tail()

这将产生以下输出:

enter image description here

让我们来描绘三个系列:

# Plot the prices series
import matplotlib.pyplot as plt
%matplotlib inline
for symbol in symbols:
    data[symbol].plot(figsize=(10,8))

plt.show()

图表:

enter image description here

现在,我们在数据集上运行协整Johansen测试:

import numpy as np
import pandas as pd
import statsmodels.api as sm

# data = pd.read_csv("http://web.pdx.edu/~crkl/ceR/data/usyc87.txt",index_col='YEAR',sep='\s+',nrows=66)
# y = data['Y']
# c = data['C']

from statsmodels.tsa.vector_ar.vecm import coint_johansen

"""
    Johansen cointegration test of the cointegration rank of a VECM

    Parameters
    ----------
    endog : array_like (nobs_tot x neqs)
        Data to test
    det_order : int
        * -1 - no deterministic terms - model1
        * 0 - constant term - model3
        * 1 - linear trend
    k_ar_diff : int, nonnegative
        Number of lagged differences in the model.
        
    Returns
    -------
    result: Holder
    An object containing the results which can be accessed using dot-notation. The object’s attributes are

    eig: (neqs) - Eigenvalues.
    evec: (neqs x neqs) - Eigenvectors.
    lr1: (neqs) - Trace statistic.
    lr2: (neqs) - Maximum eigenvalue statistic.
    cvt: (neqs x 3) - Critical values (90%, 95%, 99%) for trace statistic.
    cvm: (neqs x 3) - Critical values (90%, 95%, 99%) for maximum eigenvalue statistic. 
    method: str “johansen”
    r0t: (nobs x neqs) - Residuals for Δ𝑌.
    rkt: (nobs x neqs) - Residuals for 𝑌−1.
    ind: (neqs) - Order of eigenvalues.
    """

def joh_output(res):
    output = pd.DataFrame([res.lr2,res.lr1],
                          index=['max_eig_stat',"trace_stat"])
    print(output.T,'\n')
    print("Critical values(90%, 95%, 99%) of max_eig_stat\n",res.cvm,'\n')
    print("Critical values(90%, 95%, 99%) of trace_stat\n",res.cvt,'\n')


# model with constant/trend (deterministic) term with lags set to 1
joh_model = coint_johansen(data,0,1) # k_ar_diff +1 = K
joh_output(joh_model)

enter image description here

由于测试值远大于临界值,我们可以排除无效假设,并宣布三个crpto对之间存在非常高的协整性

现在让我们打印特征值:

阵列([0.02903038,0.01993949,0.00584357])

我们的特征向量的第一行应该被认为是最强的,因为它的平均回复半衰期最短:

print('Eigenvector in scientific notation:\n{0}\n'.format(joh_model.evec[0]))
print('Eigenvector in decimal notation:')
i = 0
for val in joh_model.evec[0]:
    print('{0}: {1:.10f}'.format(i, val))
    i += 1

结果:

科学记数法中的特征向量: [2.21531848e-04-1.70103937e-04-9.403745E-05]

十进制表示法中的特征向量: 0: 0.0002215318 1: -0.0001701039 2:-0.0000940375

这是我在介绍中提到的问题。根据厄尼的描述,这些值应与每个交叉点的对冲比率相关。然而,它们是a)小b)其中两个是负的(对于这三个密码对来说显然是不正确的),并且c)似乎与测试数据完全不相关(例如,BTC显然是以巨大溢价交易的,应该是最小值)

现在我不是数学天才,我很有可能在某个地方搞砸了,这就是为什么我提供了复制所涉及的所有代码/步骤。任何指点和见解都将不胜感激。非常感谢

更新:根据MilTom的建议,我将数据集转换为收益率百分比,结果如下:

 max_eig_stat  trace_stat
0    127.076209  133.963475
1      6.581045    6.887266
2      0.306221    0.306221 

Critical values(90%, 95%, 99%) of max_eig_stat
 [[18.8928 21.1314 25.865 ]
 [12.2971 14.2639 18.52  ]
 [ 2.7055  3.8415  6.6349]] 

Critical values(90%, 95%, 99%) of trace_stat
 [[27.0669 29.7961 35.4628]
 [13.4294 15.4943 19.9349]
 [ 2.7055  3.8415  6.6349]] 

Eigenvector in scientific notation:
[ 0.00400041 -0.01952632 -0.0133122 ]

Eigenvector in decimal notation:
0: 0.0040004070
1: -0.0195263209
2: -0.0133122020

这看起来更合适,但鉴于第1行和第2行的值较低,Johansen测试似乎无法排除空场景。显然没有相关性,至少我是这么看结果的


Tags: ofinimportfordatamodelassymbol