R.scale（）和sklearn.preprocessing.scale()

# import packages from sklearn import preprocessing import numpy import rpy2.robjects.numpy2ri from rpy2.robjects.packages import importr rpy2.robjects.numpy2ri.activate() # Set up R namespaces R = rpy2.robjects.r np1 = numpy.array([[1.0,2.0],[3.0,1.0]]) print "Numpy-array:" print np1 print "Scaled numpy array through R.scale()" print R.scale(np1) print "-------" print "Scaled numpy array through preprocessing.scale()" print preprocessing.scale(np1, axis = 0, with_mean = True, with_std = True) scaler = preprocessing.StandardScaler() scaler.fit(np1) print "Mean of preprocessing.scale():" print scaler.mean_ print "Std of preprocessing.scale():" print scaler.std_

2条回答

网友

1楼 · 编辑于 2024-06-28 19:24:54

R.scale文档说明：

The root-mean-square for a (possibly centered) column is defined as sqrt(sum(x^2)/(n-1)), where x is a vector of the non-missing values and n is the number of non-missing values. In the case center = TRUE, this is the same as the standard deviation, but in general it is not. (To scale by the standard deviations without centering, use scale(x, center = FALSE, scale = apply(x, 2, sd, na.rm = TRUE)).)

但是，sklearn.preprocessing.StandardScale总是按标准差缩放。在

在我的例子中，我想在Python中复制R.scale而不使用centered，我以一种稍微不同的方式遵循了@Sid advice：

import numpy as np

def get_scale_1d(v):
    # I copy this function from R source code haha
    v = v[~np.isnan(v)]
    std = np.sqrt(
        np.sum(v ** 2) / np.max([1, len(v) - 1])
    )
    return std

sc = StandardScaler()
sc.fit(data)
sc.std_ = np.apply_along_axis(func1d=get_scale_1d, axis=0, arr=x)
sc.transform(data)

网友

2楼 · 编辑于 2024-06-28 19:24:54

这似乎与如何计算标准差有关。在

>>> import numpy as np
>>> a = np.array([[1, 2],[3, 1]])
>>> np.std(a, axis=0)
array([ 1. ,  0.5])
>>> np.std(a, axis=0, ddof=1)
array([ 1.41421356,  0.70710678])

从numpy.stddocumentation

ddof : int, optional
Means Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N represents the number of elements. By default ddof is zero.

显然，R.scale()使用ddof=1，但是{}使用{}。在

编辑：（解释如何使用备用ddof）

在不访问StandardScaler（）对象本身的变量的情况下，似乎没有一种简单的方法可以使用备用ddof计算std。在

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章