Python线性回归模型（Pandas，statsmodels）值错误：endog exog矩阵大小不匹配

#We have a portfolio constructed of 3 randomly generated factors (fac1, fac2, fac3). #Python code provides the following message #ValueError: The indices for endog and exog are not aligned import pandas as pd from numpy.random import rand import numpy as np import statsmodels.api as sm fac1, fac2, fac3 = np.random.rand(3, 1000) #Generate random factors #Consider a collection of hypothetical stock portfolios #Generate randomly 1000 tickers import random; random.seed(0) import string N = 1000 def rands(n): choices = string.ascii_uppercase return ''.join([random.choice(choices) for _ in range(n)]) tickers = np.array([rands(5) for _ in range(N)]) ticker_subset = tickers.take(np.random.permutation(N)[:1000]) #Weighted sum of factors plus noise port = pd.Series(0.7 * fac1 - 1.2 * fac2 + 0.3 * fac3 + rand(1000), index=ticker_subset) factors = pd.DataFrame({'f1': fac1, 'f2': fac2, 'f3': fac3}, index=ticker_subset) #Correlations between each factor and the portfolio #print(factors.corrwith(port)) factors1=sm.add_constant(factors) #Calculate factor exposures using a regression estimated by OLS #print(sm.OLS(np.asarray(port), np.asarray(factors1)).fit().params) #Calculate the exposure on each industry def beta_exposure(chunk, factors=None): return sm.OLS(np.asarray(chunk), np.asarray(factors)).fit().params #Assume that we have only two industries – financial and tech ind_names = np.array(['Financial', 'Tech']) #Create a random industry classification sampler = np.random.randint(0, len(ind_names), N) industries = pd.Series(ind_names[sampler], index=tickers, name='industry') by_ind = port.groupby(industries) exposures=by_ind.apply(beta_exposure, factors=factors1) print(exposures) #exposures.unstack() #Determinate the exposures on each industry

1条回答

网友

1楼 · 发布于 2024-09-26 17:48:16

了解错误消息：

ValueError: endog and exog matrices are different sizes

好吧，还不错。内源基质和外源基质大小不同。模块提供了这个page，它告诉内生因素是系统内的因素，外生因素是系统外的因素。在

一些调试

检查我们得到的阵列形状。要做到这一点，我们需要拆开这一行程序并打印参数的.shape，或者打印每个参数的前几个。另外，注释掉抛出错误的行。因此，我们发现：

chunk [490]
factor [1000    4]
chunk [510]
factor [1000    4]

哦！就在这里。我们原以为因素也会被分块。第一次应该是[490 4]，第二次应该是[5104]。注意：由于类别是随机分配的，因此每次都会有所不同。在

所以基本上我们在这个函数里有太多的信息。我们可以使用块来查看选择哪些因素，过滤这些因素，然后一切都会正常工作。在

查看文档中的函数定义：

^{pr2}$
我们只传递了两个参数，其余的是可选的。让我们看看我们经过的那两个。在
endog (array-like) – 1-d endogenous response variable. The dependent variable.
exog (array-like) – A nobs x k array where nobs is the number of observations and k is the number of regressors...
啊，又是endog和{}。endog是一维数组。到目前为止，shape490还不错。exognobs？哦，它的观察次数。所以这是一个二维数组，在这个例子中，我们需要形状490，由4组成。在
具体问题：
beta_exposure应该是：
def beta_exposure(chunk, factors=None): factors = factors.loc[factors.index.isin(chunk.index)] return sm.OLS(np.asarray(chunk), np.asarray(factors)).fit().params
问题是，你要对列表的每一部分应用beta公开（它是随机的，所以假设490个元素用于Financial，510个元素用于Tech），但是factors=factors1总是给你1000个值（groupby代码没有涉及到这一点）。在
请参阅http://www.statsmodels.org/dev/generated/statsmodels.regression.linear_model.OLS.html和http://www.statsmodels.org/dev/endog_exog.html以获取我用于研究此问题的参考资料。在

了解错误消息：

一些调试

查看文档中的函数定义：

具体问题：

相关问题更多 >

编程相关推荐

热门问题

热门文章