使用python合并csv文件而不重复head

import glob interesting_files = glob.glob("/home/tcs/PYTHONMAP/test1/*.csv") header_saved = False with open('/home/tcs/PYTHONMAP/output.csv','wb') as fout: for filename in interesting_files: with open(filename) as fin: header = next(fin) if not header_saved: fout.write(header) header_saved = True for line in fin: fout.write(line)

2条回答

网友

1楼 · 编辑于 2024-09-29 00:18:21

使用熊猫：

import pandas as pd

interesting_files = glob.glob("/home/tcs/PYTHONMAP/test1/*.csv") 
df = pd.concat((pd.read_csv(f, header = 0) for f in interesting_files))
df.to_csv("output.csv")

要同时删除重复行，请执行以下操作：

import pandas as pd

interesting_files = glob.glob("/home/tcs/PYTHONMAP/test1/*.csv") 
df = pd.concat((pd.read_csv(f, header = 0) for f in interesting_files))
df_deduplicated = df.drop_duplicates()
df_deduplicated.to_csv("output.csv")

这不会在创建数据帧时消除重复项，而是在之后。因此，通过连接所有文件来创建数据帧。然后对其进行重复数据消除。最后的数据帧可以保存到csv。

网友

2楼 · 编辑于 2024-09-29 00:18:21

import glob
import csv
interesting_files = glob.glob("/home/tcs/PYTHONMAP/test1/*.csv") 

header_saved = False
with open('/home/tcs/PYTHONMAP/output.csv', 'w') as fout:
    writer = csv.writer(fout)
    for filename in interesting_files:
        with open(filename) as fin:
            header =  next(fin)
            if not header_saved:
                writer.writerows(header) # you may need to work here. The writerows require an iterable.
                header_saved = True
            writer.writerows(fin.readlines())

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用python合并csv文件而不重复head

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >