我有一个数据框,有2170条记录,矢量化后我有6000多列
在执行PCA时,为了考虑0.5的方差,我需要最小的1500列。
所以结果数据是2170*1500。当然,当我运行神经网络(简单的2-3层,每个16个神经元)或线性回归时,我会面临严重的过度拟合
我对其执行PCA的源数据集如下所示
df_in.head(5)
Out[204]:
'magic' matt alan 50 cent a.j. cook a.j. wone \
Product_name
like minds 0 0 0 0
16 years of alcohol 0 0 0 0
gisaku 0 0 0 0
deadly cargo 0 0 0 0
to die in san hilario 0 0 0 0
aaron abrams aaron b. oduber aaron burns \
Product_name
like minds 0 0 0
16 years of alcohol 0 0 0
gisaku 0 0 0
deadly cargo 0 0 0
to die in san hilario 0 0 0
aaron carter aaron eckhart aaron kwok ... \
Product_name ...
like minds 0 0 0 ...
16 years of alcohol 0 0 0 ...
gisaku 0 0 0 ...
deadly cargo 0 0 0 ...
to die in san hilario 0 0 0 ...
new millenum rise of hollywood silent era \
Product_name
like minds 1 0 0
16 years of alcohol 1 0 0
gisaku 1 0 0
deadly cargo 0 0 0
to die in san hilario 1 0 0
transient era imdbRating imdbVotes Academy_wins \
Product_name
like minds 0 6.3 4387 0
16 years of alcohol 0 6.3 1539 0
gisaku 0 5.7 266 0
deadly cargo 0 4.7 483 0
to die in san hilario 0 6.4 199 0
Academy_nominations Other_wins Other_nominations
Product_name
like minds 0 0 0
16 years of alcohol 0 4 9
gisaku 0 0 1
deadly cargo 0 0 0
to die in san hilario 0 2 3
df_in.shape
Out[205]: (2172, 6150)
如何向数据添加噪波以增强数据
目前没有回答
相关问题 更多 >
编程相关推荐