在python中使用pandas操作大文本文件时出现问题

2024-05-20 13:36:24 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个文本文件,如下面的小示例:

small example

0,1,2,3,4,5,6
chr1,144566,144597,30,chr1,120000,210000
chr1,154214,154245,34,chr1,120000,210000
chr1,228904,228935,11,chr1,210000,240000
chr1,233265,233297,13,chr1,210000,240000
chr1,233266,233297,58,chr1,210000,240000
chr1,235438,235469,36,chr1,210000,240000
chr1,262362,262393,16,chr1,240000,610000
chr1,347253,347284,12,chr1,240000,610000
chr1,387022,387053,38,chr1,240000,610000

我想删除第一行,而不是comma separated,创建一个tab separated文件。与预期输出类似:

expected output

chr1    144566  144597  30  chr1    120000  210000
chr1    154214  154245  34  chr1    120000  210000
chr1    228904  228935  11  chr1    210000  240000
chr1    233265  233297  13  chr1    210000  240000
chr1    233266  233297  58  chr1    210000  240000
chr1    235438  235469  36  chr1    210000  240000
chr1    262362  262393  16  chr1    240000  610000
chr1    347253  347284  12  chr1    240000  610000
chr1    387022  387053  38  chr1    240000  610000

我试图在python中使用pandas来实现这一点。我写了这个代码,但没有返回我想要的。你知道怎么修吗

import pandas
file = open('myfile.txt', 'rb')
new =[]
for line in file:
    new.append(line.split(','))
    df = pd.DataFrame(new)
    df.to_csv('outfile.txt', index=False)

Tags: 文件txt示例pandasdfnewexampleline
2条回答
import pandas as pd    
df = pd.read_csv('myfile.txt', header=0)
df.to_csv('outfile.txt', sep='\t', index=None, header=False)

根据文件的大小,避免使用Pandas和使用基本Python I/O可能是一个更有效的想法。这样您就不必将整个文件读入内存,而是逐行读取并转储到带有制表符分隔的新文件中:

with open("myfile.txt", "r") as r:
    with open("myfile2.txt", "w") as w:
        for line in r:
            w.write("\t".join(line.split(',')))

myfile2.txt现在是myfile.txt的制表符分隔版本

相关问题 更多 >