动态拆分数据帧

2024-09-28 17:01:21 发布

您现在位置:Python中文网/ 问答频道 /正文

我有300行的df,它们并不总是平均分布的。它们看起来像这样:

Lags     Rep 1      Rep 2     Rep 3 
12.500000000E-9     7671.039418     6605.763724     10144.873125
25.000000000E-9     -1.000000   -0.479659   1.454251
37.500000000E-9     31.978402   23.456005   29.678136
50.000000000E-9     5.315013    4.723746    0.227125
62.500000000E-9     1.806673    2.642384    2.681376
75.000000000E-9     NaN     NaN     NaN
83.500000000E-9     NaN     NaN     NaN

Time    PhtA count 1     PhtA count 2     PhtA count 3
0.000000000E+0  42.743683   10.890961   12.454987
2.428800000E-3  14.533997   8.125305    7.534027
4.857600000E-3  8.621216    7.686615    7.133484
7.286400000E-3  5.779266    10.147095   6.561279
9.715200000E-3  6.046295    8.201599    5.187988
12.144000000E-3     5.226135    7.343292    5.855560

Time    PhtB count 1     PhtB count 2     PhtB count 3
0.860800000E-3  12.626648   13.580322   8.220673
1.289600000E-3  10.814667   21.381378   7.038116
2.718400000E-3  7.915497    17.261505   7.648468
3.147200000E-3  9.403229    21.266937   10.013580

拆分时,最好有3个这样的dfs:

第一个df:

Lags     Rep 1      Rep 2     Rep 3 
12.500000000E-9     7671.039418     6605.763724     10144.873125
25.000000000E-9     -1.000000   -0.479659   1.454251
37.500000000E-9     31.978402   23.456005   29.678136
50.000000000E-9     5.315013    4.723746    0.227125
62.500000000E-9     1.806673    2.642384    2.681376

第二个df:

Time    PhtA count 1     PhtA count 2     PhtA count 3
0.000000000E+0  42.743683   10.890961   12.454987
2.428800000E-3  14.533997   8.125305    7.534027
4.857600000E-3  8.621216    7.686615    7.133484
7.286400000E-3  5.779266    10.147095   6.561279
9.715200000E-3  6.046295    8.201599    5.187988
12.144000000E-3     5.226135    7.343292    5.855560

第三个df

Time    PhtB count 1     PhtB count 2     PhtB count 3
0.860800000E-3  12.626648   13.580322   8.220673
1.289600000E-3  10.814667   21.381378   7.038116
2.718400000E-3  7.915497    17.261505   7.648468
3.147200000E-3  9.403229    21.266937   10.013580

三个块的长度并不总是相同的,这就是为什么我请求帮助以编程的方式解决这个问题。关于第一个df,我可以说的几个细节是:

  • 第一个块总是以一堆值为NaN的行结束(在本例中只有两行)

  • 还有两个以命名列标题开头的块(Time,PhtA count 1,PhtA count 2,…)

  • 最后两个块没有任何NaN值

  • 所有块的行数都是可变的,尽管标题总是相同的

  • 始终有一个空行分隔块

任何帮助都将不胜感激。你知道吗

提前谢谢。你知道吗


Tags: 标题dftime编程count方式nan命名
1条回答
网友
1楼 · 发布于 2024-09-28 17:01:21

首先将所有数据读入保留空行的df,然后在这些空行处拆分并转换为数字:

df = pd.read_csv('data.csv', sep='\s{2,}', skip_blank_lines=False, engine='python')
x = df[df.Lags.isnull()==True].index.values

df1 = df[0:x[0]].dropna().apply(pd.to_numeric)

df2 = df[x[0]+2:x[1]].apply(pd.to_numeric)
df2.columns=df.iloc[x[0]+1].values

df3 = df[x[1]+2:].apply(pd.to_numeric)
df3.columns = df.iloc[x[1]+1].values

print(df1);print(df2); print(df3)的输出:

           Lags        Rep 1        Rep 2         Rep 3
0  1.250000e-08  7671.039418  6605.763724  10144.873125
1  2.500000e-08    -1.000000    -0.479659      1.454251
2  3.750000e-08    31.978402    23.456005     29.678136
3  5.000000e-08     5.315013     4.723746      0.227125
4  6.250000e-08     1.806673     2.642384      2.681376
        Time  PhtA count 1  PhtA count 2  PhtA count 3
9   0.000000     42.743683     10.890961     12.454987
10  0.002429     14.533997      8.125305      7.534027
11  0.004858      8.621216      7.686615      7.133484
12  0.007286      5.779266     10.147095      6.561279
13  0.009715      6.046295      8.201599      5.187988
14  0.012144      5.226135      7.343292      5.855560
        Time  PhtB count 1  PhtB count 2  PhtB count 3
17  0.000861     12.626648     13.580322      8.220673
18  0.001290     10.814667     21.381378      7.038116
19  0.002718      7.915497     17.261505      7.648468
20  0.003147      9.403229     21.266937     10.013580


好处:csv中任意数量的数据块的通用解决方案,以空行分隔(它们的数量不需要事先知道):
df = pd.read_csv('data.csv', sep='\s{2,}', skip_blank_lines=False, engine='python', header=None)
x = [-1] + list(df[df.iloc[:,0].isnull()==True].index.values) + [len(df)]
for i in range(1,len(x)):
     globals()[f'df{i}'] = df[x[i-1]+2:x[i]].dropna().apply(pd.to_numeric)
     globals()[f'df{i}'].columns = df.iloc[x[i-1]+1].values

相关问题 更多 >