Numpy.var（）和Pandas.var（）的不同值

from sklearn.preprocessing import StandardScaler ss = StandardScaler() var = ss.fit_transform(catDf.iloc[:,1:-1]).var() #This variance is equal to 1 catDf.iloc[:,1:-1] = ss.fit_transform(catDf.iloc[:,1:-1]) print("Variance in Numpy array", var) # Approx 1 print("Variance in Data Frame", catDf.var())# 1.5 for both numerical columns

1条回答

网友

1楼 · 发布于 2024-10-02 00:36:36

问题是由于使用了不同的自由度。scikit学习文档声明他们使用有偏差的估计器或样本方差：

We use a biased estimator for the standard deviation, equivalent to numpy.std(x, ddof=0). Note that the choice of ddof is unlikely to affect model performance.

另一方面，默认情况下，DataFrame.var是无偏的估计量

Return unbiased variance over requested axis. Normalized by N-1 by default. This can be changed using the ddof argument

只有3个点，除以3或2的差值将导致1.5倍的差值（与您看到的完全相同）。通过对DataFrame.var()使用ddof=0来解决这个问题

print(catDf.var(ddof=0))
#GDP     1.0
#Area    1.0

相关问题更多 >

编程相关推荐

热门问题

热门文章