我想用StandardScaler
来缩放数据。我把数据装进了Python。数据似乎很稀少。要应用StandardScaler
,我们应该首先将其转换为密集类型。你知道吗
trainData = MLUtils.loadLibSVMFile(sc, trainDataPath)
valData = MLUtils.loadLibSVMFile(sc, valDataPath)
trainLabel = trainData.map(lambda x: x.label)
trainFeatures = trainData.map(lambda x: x.features)
valLabel = valData.map(lambda x: x.label)
valFeatures = valData.map(lambda x: x.features)
scaler = StandardScaler(withMean=True, withStd=True).fit(trainFeatures)
# apply the scaler into the data. Here, trainFeatures is a sparse PythonRDD, we first convert it into dense tpye
trainFeatures_scaled = scaler.transform(trainFeatures)
valFeatures_scaled = scaler.transform(valFeatures)
# merge `trainLabel` and `traiFeatures_scaled` into a new PythonRDD
trainData1 = ...
valData1 = ...
# using the scaled data, i.e., trainData1 and valData1 to train a model
...
上面的代码有错误。我有两个问题:
trainFeatures
转换成可以作为StandardScaler
输入的密集tpye?你知道吗trainLabel
和trainFeatures_scaled
合并成一个新的可用于训练分类器的标点(如随机林)?你知道吗我仍然能找到关于这件事的任何文件或参考资料。你知道吗
要使用
toArray
转换为密集贴图,请执行以下操作:要合并zip:
相关问题 更多 >
编程相关推荐