使用pandas读取带有稀疏标记列标题的CSV

2024-06-25 06:43:16 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个.csv文件,我试图读入一个有多行列标题的pandas数据框,但是第一行的标签很少。你知道吗

例如:

Binned_average_and_predicted_H2O_spectra_sorted_by_RH-class.,,,,,,,,
,RH=0.8,,,,RH=0.9,,,
,n_=_60,,,,n_=_29,,,
nat_freq,avrg_sp(T),avrg_sp(h2o),denoised_avrg_sp(h2o),pred_sp(h2o),avrg_sp(T),avrg_sp(h2o),denoised_avrg_sp(h2o),pred_sp(h2o)
6.10E-04,8.40E-02,0.117551351,0.117550357,8.64E-02,0.128696811,0.163304381,0.163304015,0.127552704
1.22E-03,7.49E-02,0.126467592,0.126465605,7.70E-02,9.05E-02,0.200350295,0.200349563,8.97E-02
1.83E-03,7.54E-02,0.124370072,0.124367091,7.76E-02,8.54E-02,0.121274897,0.121273799,8.46E-02
2.44E-03,7.76E-02,0.136590839,0.136586865,7.99E-02,5.45E-02,0.100995665,0.100994202,5.40E-02
3.05E-03,8.73E-02,0.141422799,0.141417832,8.98E-02,7.57E-02,0.170033442,0.170031614,7.50E-02
3.66E-03,7.29E-02,0.143599074,0.143593115,7.50E-02,0.10001777,0.165468366,0.165466173,9.91E-02

当我读csv的时候

Cosp2 = pd.read_csv(DPath,index_col=0, header=[1,3])
print(Cosp2)

最后,在所有未显式标记的头的第一级头上都有未命名的:##u level_0标签。你知道吗

             RH=0.8 Unnamed: 2_level_0    Unnamed: 3_level_0  \
nat_freq avrg_sp(T)       avrg_sp(h2o) denoised_avrg_sp(h2o)   
0.00061      0.0840           0.117551              0.117550   
0.00122      0.0749           0.126468              0.126466   
0.00183      0.0754           0.124370              0.124367   
0.00244      0.0776           0.136591              0.136587   
0.00305      0.0873           0.141423              0.141418   
0.00366      0.0729           0.143599              0.143593   

         Unnamed: 4_level_0     RH=0.9 Unnamed: 6_level_0  \
nat_freq       pred_sp(h2o) avrg_sp(T)       avrg_sp(h2o)   
0.00061              0.0864   0.128697           0.163304   
0.00122              0.0770   0.090500           0.200350   
0.00183              0.0776   0.085400           0.121275   
0.00244              0.0799   0.054500           0.100996   
0.00305              0.0898   0.075700           0.170033   
0.00366              0.0750   0.100018           0.165468   

            Unnamed: 7_level_0 Unnamed: 8_level_0  
nat_freq denoised_avrg_sp(h2o)       pred_sp(h2o)  
0.00061               0.163304           0.127553  
0.00122               0.200350           0.089700  
0.00183               0.121274           0.084600  
0.00244               0.100994           0.054000  
0.00305               0.170032           0.075000  
0.00366               0.165466           0.099100 

有没有办法让pandas在未标记的列中传播0级标签?我想要这样的东西:

             RH=0.8                                                  \
nat_freq avrg_sp(T) avrg_sp(h2o) denoised_avrg_sp(h2o) pred_sp(h2o)   
0.00061      0.0840     0.117551              0.117550       0.0864   
0.00122      0.0749     0.126468              0.126466       0.0770   
0.00183      0.0754     0.124370              0.124367       0.0776   
0.00244      0.0776     0.136591              0.136587       0.0799   
0.00305      0.0873     0.141423              0.141418       0.0898   
0.00366      0.0729     0.143599              0.143593       0.0750   

             RH=0.9                                                  
nat_freq avrg_sp(T) avrg_sp(h2o) denoised_avrg_sp(h2o) pred_sp(h2o)  
0.00061    0.128697     0.163304              0.163304     0.127553  
0.00122    0.090500     0.200350              0.200350     0.089700  
0.00183    0.085400     0.121275              0.121274     0.084600  
0.00244    0.054500     0.100996              0.100994     0.054000  
0.00305    0.075700     0.170033              0.170032     0.075000  
0.00366    0.100018     0.165468              0.165466     0.099100 

Tags: csv标记pandas标签levelnatsprh
1条回答
网友
1楼 · 发布于 2024-06-25 06:43:16

您可以首先将^{}^{}一起用于Series

a = Cosp2.columns.get_level_values(0).to_series()
print (a)
RH=0.8                            RH=0.8
Unnamed: 2_level_0    Unnamed: 2_level_0
Unnamed: 3_level_0    Unnamed: 3_level_0
Unnamed: 4_level_0    Unnamed: 4_level_0
RH=0.9                            RH=0.9
Unnamed: 6_level_0    Unnamed: 6_level_0
Unnamed: 7_level_0    Unnamed: 7_level_0
Unnamed: 8_level_0    Unnamed: 8_level_0
dtype: object

如果^{}Unnamed,则对NaN使用^{},并将NaN替换为ffill^{}替换为method='ffill'

b = a.mask(a.str.startswith('Unnamed')).ffill()
print (b)
RH=0.8                RH=0.8
Unnamed: 2_level_0    RH=0.8
Unnamed: 3_level_0    RH=0.8
Unnamed: 4_level_0    RH=0.8
RH=0.9                RH=0.9
Unnamed: 6_level_0    RH=0.9
Unnamed: 7_level_0    RH=0.9
Unnamed: 8_level_0    RH=0.9
dtype: object

上次通过^{}创建新的MultiIndex

Cosp2.columns = pd.MultiIndex.from_arrays([b, Cosp2.columns.get_level_values(1)])
print (Cosp2)
             RH=0.8                                                  \
nat_freq avrg_sp(T) avrg_sp(h2o) denoised_avrg_sp(h2o) pred_sp(h2o)   
0.00061      0.0840     0.117551              0.117550       0.0864   
0.00122      0.0749     0.126468              0.126466       0.0770   
0.00183      0.0754     0.124370              0.124367       0.0776   
0.00244      0.0776     0.136591              0.136587       0.0799   
0.00305      0.0873     0.141423              0.141418       0.0898   
0.00366      0.0729     0.143599              0.143593       0.0750   

             RH=0.9                                                  
nat_freq avrg_sp(T) avrg_sp(h2o) denoised_avrg_sp(h2o) pred_sp(h2o)  
0.00061    0.128697     0.163304              0.163304     0.127553  
0.00122    0.090500     0.200350              0.200350     0.089700  
0.00183    0.085400     0.121275              0.121274     0.084600  
0.00244    0.054500     0.100996              0.100994     0.054000  
0.00305    0.075700     0.170033              0.170032     0.075000  
0.00366    0.100018     0.165468              0.165466     0.099100  

相关问题 更多 >