擅长:python、mysql、java
<p>基于R的<a href="https://stackoverflow.com/questions/3042117/screening-multicollinearity-in-a-regression-model?rq=1">similar question</a>,还有一些其他选项可以帮助人们。我在寻找一个能捕捉共线性的数字,选项包括相关矩阵的行列式和条件数。</p>
<p>根据其中一个R答案,相关矩阵的行列式将“从0(完全共线)到1(无共线)”。我发现有界范围很有用。</p>
<p>行列式的翻译示例:</p>
<pre><code>import numpy as np
import pandas as pd
# Create a sample random dataframe
np.random.seed(321)
x1 = np.random.rand(100)
x2 = np.random.rand(100)
x3 = np.random.rand(100)
df = pd.DataFrame({'x1': x1, 'x2': x2, 'x3': x3})
# Now create a dataframe with multicollinearity
multicollinear_df = df.copy()
multicollinear_df['x3'] = multicollinear_df['x1'] + multicollinear_df['x2']
# Compute both correlation matrices
corr = np.corrcoef(df, rowvar=0)
multicollinear_corr = np.corrcoef(multicollinear_df, rowvar=0)
# Compare the determinants
print np.linalg.det(corr) . # 0.988532159861
print np.linalg.det(multicollinear_corr) . # 2.97779797328e-16
</code></pre>
<p>同样地,协方差矩阵的条件数将以完全线性相关逼近无穷大。</p>
<pre><code>print np.linalg.cond(corr) . # 1.23116253259
print np.linalg.cond(multicollinear_corr) . # 6.19985218873e+15
</code></pre>