如何使用python按行名合并两个文件

2024-10-01 11:40:32 发布

您现在位置:Python中文网/ 问答频道 /正文

我认为这应该很容易,但还没有能够解决它。我有两个文件,如下所示,我希望以某种方式合并它们,使file1中以>开头的行成为file2中行的标题

文件1:

>seq12
ACGCTCGCA
>seq34
GCATCGCGT
>seq56
GCGATGCGC

文件2:

ATCGCGCATGATCTCAG
AGCGCGCATGCGCATCG
AGCAAATTTAGCAACTC

因此,所需的输出应为:

>seq12
ATCGCGCATGATCTCAG
>seq34
AGCGCGCATGCGCATCG
>seq56
AGCAAATTTAGCAACTC

到目前为止,我已经尝试过这段代码,但在输出中,来自file2的所有行都是相同的:

from Bio import SeqIO

with open(file1) as fw:
    with open(file2,'r') as rv:
        for line in rv:
            items = line
        for record in SeqIO.parse(fw, 'fasta'):
            print('>' + record.id)
            print(line)

Tags: 文件aswithlineopenfile1file2fw
2条回答

如果无法将文件存储在内存中,则需要一种解决方案,该解决方案可以逐行读取每个文件,并相应地写入输出文件。下面的程序就是这样做的。这些评论试图澄清,尽管我相信从代码中可以看出这一点

with open("file1.txt") as first, open("file2.txt") as second, open("output.txt", "w+") as output:
    while 1:
        line_first = first.readline()       # line from file1 (header)
        line_second = second.readline()     # line from file2 (body)
        if not (line_first and line_second):
            # if any file has ended
            break

        # write to output file
        output.writelines([line_first, line_second])
        # jump one line from file1
        first.readline()

请注意,只有当file1.txt具有您提供的特定格式(奇数行是标题,偶数行是无用的)时,这才有效。 为了允许更多的定制,您可以将其封装在函数中,如下所示:

def merge_files(header_file_path, body_file_path, output_file="output.txt", every_n_lines=2):
    with open(header_file_path) as first, open(body_file_path) as second, open(output_file, "w+") as output:
        while 1:
            line_first = first.readline()       # line from header
            line_second = second.readline()     # line from body
            if not (line_first and line_second):
                # if any file has ended
                break

            # write to output file
            output.writelines([line_first, line_second])
            # jump n lines from header
            for _ in range(every_n_lines - 1):
                first.readline()

然后调用merge_files("file1.txt", "file2.txt")就可以了

如果两个文件都足够小,可以同时放入内存中,您可以简单地同时读取它们并将它们交错

# Open two file handles.
with open("f1", mode="r") as f1, open("f2", mode="r") as f2:
    lines_first = f1.readlines()   # Read all lines in f1.  
    lines_second = f2.readlines()  # Read all lines in f2. 


lines_out = []

# For each line in the file without headers...
for idx in range(len(lines_second)):
    # Take every even line from the first file and prepend it to 
    # the line from the second.
    lines_out.append(lines_first[2 * idx + 1].rstrip() + lines_second[idx].rstrip())

您可以非常方便地生成seq头文件idx:我将此作为练习留给读者

如果其中一个或两个文件太大,无法放入内存,则可以在两个句柄上逐行重复上述过程(使用一个变量存储文件头中的信息)

相关问题 更多 >