从同一文件夹中的多个文件创建一个csv文件

import pandas as pd import glob path = # add path all_files = glob.glob(path + ".csv") # look for all the csv files in that folder. Probably this is not the right code for looking at them file_list = [] for filename in all_files: df = pd.read_csv(filename) file_list(df)

2条回答

网友

1楼 · 编辑于 2024-05-19 13:32:12

你不需要在这里做任何复杂的事情。你知道标题行，你知道你想要的是除了标题之外的所有东西。只需打开文件，跳过第一行，然后写入。这比内存中大量数据帧的内存消耗效率要高得多

import glob

with open("final_file.csv", "w") as outfile:
    for count, filename in enumerate(glob.glob(path + ".csv")):
        with open(filename) as infile:
            header = next(infile)
            if count == 0:
                outfile.write(header)
            line = next(infile)
            if not line.startswith("\n"):
                line = line + "\n"
            outfile.write(line)

网友

2楼 · 编辑于 2024-05-19 13:32:12

我建议使用pd.concat将数据帧组合成一个大数据帧，如果愿意，可以将其保存到另一个文件中

在连接数据帧之前，您可能必须修改对pd.read_csv的调用，以确保正确处理数据。如果问题中的示例数据与CSV文件的内容逐字匹配，则代码段如下所示：

import pandas as pd
import glob

path = "/my_path" # set this to the folder containing CSVs
names = glob.glob(path + "*.csv") # get names of all CSV files under path

# If your CSV files use commas to split fields, then the sep 
# argument can be ommitted or set to ","
file_list = pd.concat([pd.read_csv(filename, sep=" ") for filename in names])

#save the DataFrame to a file
file_list.to_csv("combined_data.csv")

请注意，组合索引中的每一行仍将基于其源文件中的行号编制索引，从而创建重复的行索引。要更改此设置，请调用pd.DataFrame.reset_index()

file_list = file_list.reset_index(drop=True)

相关问题更多 >

编程相关推荐

热门问题

热门文章