如何使用for循环组合多个数据帧?

2024-10-05 11:01:18 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图合并多个列,其中一列之后的下一列在特定索引中开始。例如,正如您在下面的代码中看到的,我有15组从df20到df90的数据。如代码中所示,我合并了数据I,然后从index=1000开始合并另一个数据

所以我希望我的输出是df20,然后是df25,从索引=1000开始,然后是df30,从索引=2000开始,然后是df35,从索引=3000开始。我想查看所有15列,但我的输出中只有一列

我在下面试过,但似乎不起作用。请帮忙

dframe = [df20, df25, df30, df35, df40, df45, df50, df55, df60, df65, df70, df75, df80, df85, df90]
for i in dframe:
  a = i.merge((i).set_index((i).index+1000), how='outer', left_index=True, right_index=True)

print(a)

输出:

                      df90_x              df90_y
0                     0.000757                      NaN
1                     0.001435                      NaN
2                     0.002011                      NaN
3                     0.002497                      NaN
4                     0.001723                      NaN
...                        ...                      ...
10995                      NaN             1.223000e-12
10996                      NaN             1.305000e-12
10997                      NaN             1.809000e-12
10998                      NaN             2.075000e-12
10999                      NaN             2.668000e-12

[11000 rows x 2 columns]

预期产出:

                      df20                 df25                  df30
0                     0.000757             0                     0
1                     0.001435             0                     0
2                     0.002011             0                     0
3                     0.002497             0                     0
4                     0.001723             0                     0
...                  ...                   ...                   ...
1000                                      1.223000e-12           0
1001                                      1.305000e-12           0
1002                                      1.809000e-12           0
1003                                      2.668000e-12           0
...                                                              ...
2000                                                             0.1234
2001                                                             0.4567
2002                                                             0.8901
2003                                                             0.2345

Tags: 数据代码trueindexnandframedf25df90
2条回答

请参阅official page


Concat多个数据帧

df1=pd.DataFrame(
        {
            "A":["A0","A1","A2","A3"]
        },
        index=[0, 1, 2, 3]
)
df2=pd.DataFrame(
        {
            "B":["B4","B5"]
        },
        index=[4, 5]
)
df3=pd.DataFrame(
        {
            "C":["C6", "C7", "C8", "C9", "C10"]
        },
        index=[6, 7, 8, 9, 10]
)
result = pd.concat([df1, df2, df3], axis=1)
display(result)

输出:

      A    B    C
0    A0  NaN  NaN
1    A1  NaN  NaN
2    A2  NaN  NaN
3    A3  NaN  NaN
4   NaN   B4  NaN
5   NaN   B5  NaN
6   NaN  NaN   C6
7   NaN  NaN   C7
8   NaN  NaN   C8
9   NaN  NaN   C9
10  NaN  NaN  C10

通过循环将文件导入列表

方法1: 您可以创建一个列表,将整个文件名放入列表中

filenames = ['sample_20.csv', 'sample_25.csv', 'sample_30.csv', ...]
dataframes = [pd.read_csv(f) for f in filenames]

方法1-1: 如果您确实有很多文件,那么您需要一种更快的方法来创建名称列表

filenames = ['sample_{}.csv'.format(i) for i in range(20, 90, 5)]
dataframes = [pd.read_csv(f) for f in filenames]

方法2:

from glob import glob
filenames = glob('sample*.csv')
dataframes = [pd.read_csv(f) for f in filenames]

如果希望变量为num_dataframelength_dataframe,可以尝试此代码:

import pandas as pd
import random

dframe = list()
num_dataframe = 3
len_dataframe = 5

for i in range((num_dataframe)):
    dframe.append(pd.DataFrame({i:[random.randrange(1, 50, 1) for i in range(len_dataframe)]},
                               index=range(i*len_dataframe, (i+1)*len_dataframe)))


result = pd.concat([dframe[i] for i in range(num_dataframe)], axis=1)

result.fillna(0)

输出:

enter image description here

对于您的问题,您需要20个1000长度的数据帧,您可以尝试以下方法:

import pandas as pd
import random

dframe = list()
num_dataframe = 20
len_dataframe = 1000

for i in range((num_dataframe)):
    dframe.append(pd.DataFrame({i:[np.random.random() for i in range(len_dataframe)]},
                               index=range(i*len_dataframe, (i+1)*len_dataframe)))


result = pd.concat([dframe[i] for i in range(num_dataframe)], axis=1)

result.fillna(0)

输出:

enter image description here

正如你在评论中提到的,我编辑了这篇文章并添加了以下代码:

dframe = [df20, df25, df30, df35, df40, df45, df50, df55, df60, df65, df70, df75, df80, df85, df90]

result = pd.concat([dframe[i] for i in range(len(dframe))], axis=0)

result.fillna(0)

相关问题 更多 >

    热门问题