为高维输入d的特征添加噪声

2024-09-28 21:16:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据框,有2170条记录,矢量化后我有6000多列

在执行PCA时,为了考虑0.5的方差,我需要最小的1500列。

所以结果数据是2170*1500。当然,当我运行神经网络(简单的2-3层,每个16个神经元)或线性回归时,我会面临严重的过度拟合

我对其执行PCA的源数据集如下所示

df_in.head(5)
Out[204]: 
                       'magic' matt alan  50 cent  a.j. cook  a.j. wone  \
Product_name                                                              
like minds                             0        0          0          0   
16 years of alcohol                    0        0          0          0   
gisaku                                 0        0          0          0   
deadly cargo                           0        0          0          0   
to die in san hilario                  0        0          0          0   

                       aaron abrams  aaron b. oduber  aaron burns  \
Product_name                                                        
like minds                        0                0            0   
16 years of alcohol               0                0            0   
gisaku                            0                0            0   
deadly cargo                      0                0            0   
to die in san hilario             0                0            0   

                       aaron carter  aaron eckhart  aaron kwok  ...  \
Product_name                                                    ...   
like minds                        0              0           0  ...   
16 years of alcohol               0              0           0  ...   
gisaku                            0              0           0  ...   
deadly cargo                      0              0           0  ...   
to die in san hilario             0              0           0  ...   

                       new millenum  rise of hollywood  silent era  \
Product_name                                                         
like minds                        1                  0           0   
16 years of alcohol               1                  0           0   
gisaku                            1                  0           0   
deadly cargo                      0                  0           0   
to die in san hilario             1                  0           0   

                       transient era  imdbRating  imdbVotes  Academy_wins  \
Product_name                                                                
like minds                         0         6.3       4387             0   
16 years of alcohol                0         6.3       1539             0   
gisaku                             0         5.7        266             0   
deadly cargo                       0         4.7        483             0   
to die in san hilario              0         6.4        199             0   

                       Academy_nominations  Other_wins  Other_nominations  
Product_name                                                               
like minds                               0           0                  0  
16 years of alcohol                      0           4                  9  
gisaku                                   0           0                  1  
deadly cargo                             0           0                  0  
to die in san hilario                    0           2                  3  

df_in.shape
Out[205]: (2172, 6150)

enter image description here

如何向数据添加噪波以增强数据


Tags: oftonameinproductlikesancargo