如何在python中使用特定的标记行断开文件

2024-09-30 06:21:35 发布

您现在位置:Python中文网/ 问答频道 /正文

cHow可以使用特定的标记行将一个文件拆分为单独的文件,下面给出了一个示例

我有一个名为“seq.txt”的文件,其中包含500个序列(字符串数据)

现在我想分离所有序列文件。我写下了一段代码,然后把它吸到中间:

import glob
import sys

fname = glob.glob("*.txt")

seq= []

for fn in fname:
    f = open(fn)
    lines = f.readlines()
    for i,line in enumerate(lines):
        if ">" in line:
            print i,line,

输入数据如下所示

> jai_Seq1flkgh456456
HFSKDFHSKDHFAKHKASDHFKASDHFSKHFSKDFH
KJSLKDJFLSKDJFSLDKFJSLKDFJSLKDFJLSKD
KJSLDKJFSKLDFJSDKFJSLDKFJSLKDFJSLDKF
KSJDFLSJDFKLSDJFSLDKFJSLDKFJSLDKFJLK
> jai_Seq14564564
HFSKDFHSKDHFAKHKASDHFKASDHFSKHFSKDFH
KJSLKDJFLSKDJFSLDKFJSLKDFJSLKDFJLSKD
KJSLDKJFSKLDFJSDKFJSLDKFJSLKDFJSLDKF
KSJDFLSJDFKLSDJFSLDKFJSLDKFJSLDKFJLK
> jai_Seq14654564
HFSKDFHSKDHFAKHKASDHFKASDHFSKHFSKDFH
KJSLKDJFLSKDJFSLDKFJSLKDFJSLKDFJLSKD
KJSLDKJFSKLDFJSDKFJSLDKFJSLKDFJSLDKF
KSJDFLSJDFKLSDJFSLDKFJSLDKFJSLDKFJLK
> jai_Seq1werwr456446
HFSKDFHSKDHFAKHKASDHFKASDHFSKHFSKDFH
KJSLKDJFLSKDJFSLDKFJSLKDFJSLKDFJLSKD
KJSLDKJFSKLDFJSDKFJSLDKFJSLKDFJSLDKF
KSJDFLSJDFKLSDJFSLDKFJSLDKFJSLDKFJLK
> jai_Seq146456456
HFSKDFHSKDHFAKHKASDHFKASDHFSKHFSKDFH
KJSLKDJFLSKDJFSLDKFJSLKDFJSLKDFJLSKD
KJSLDKJFSKLDFJSDKFJSLDKFJSLKDFJSLDKF
KSJDFLSJDFKLSDJFSLDKFJSLDKFJSLDKFJLK
> jai_Seq64654
HFSKDFHSKDHFAKHKASDHFKASDHFSKHFSKDFH
KJSLKDJFLSKDJFSLDKFJSLKDFJSLKDFJLSKD
KJSLDKJFSKLDFJSDKFJSLDKFJSLKDFJSLDKF
KSJDFLSJDFKLSDJFSLDKFJSLDKFJSLDKFJLK

标记行始终以“>;”开头我想用“>;”作为标记来分割文件。在吐出文件后,我想将该文件存储为下面给出的列表“seq_list”。之后,我知道如何编写单独的文件

 seq_list = [ "> jai_Seq1flkgh456456
                HFSKDFHSKDHFAKHKASDHFKASDHFSKHFSKDFH
                KJSLKDJFLSKDJFSLDKFJSLKDFJSLKDFJLSKD
                KJSLDKJFSKLDFJSDKFJSLDKFJSLKDFJSLDKF
                KSJDFLSJDFKLSDJFSLDKFJSLDKFJSLDKFJLK",
               "> jai_Seq14564564
                HFSKDFHSKDHFAKHKASDHFKASDHFSKHFSKDFH
                KJSLKDJFLSKDJFSLDKFJSLKDFJSLKDFJLSKD
                KJSLDKJFSKLDFJSDKFJSLDKFJSLKDFJSLDKF
                KSJDFLSJDFKLSDJFSLDKFJSLDKFJSLDKFJLK",
               "> jai_Seq14654564
                HFSKDFHSKDHFAKHKASDHFKASDHFSKHFSKDFH
                KJSLKDJFLSKDJFSLDKFJSLKDFJSLKDFJLSKD
                KJSLDKJFSKLDFJSDKFJSLDKFJSLKDFJSLDKF
                KSJDFLSJDFKLSDJFSLDKFJSLDKFJSLDKFJLK",
               "> jai_Seq1werwr456446
               HFSKDFHSKDHFAKHKASDHFKASDHFSKHFSKDFH
               KJSLKDJFLSKDJFSLDKFJSLKDFJSLKDFJLSKD
               KJSLDKJFSKLDFJSDKFJSLDKFJSLKDFJSLDKF
               KSJDFLSJDFKLSDJFSLDKFJSLDKFJSLDKFJLK",
               "> jai_Seq146456456
               HFSKDFHSKDHFAKHKASDHFKASDHFSKHFSKDFH
               KJSLKDJFLSKDJFSLDKFJSLKDFJSLKDFJLSKD
               KJSLDKJFSKLDFJSDKFJSLDKFJSLKDFJSLDKF
               KSJDFLSJDFKLSDJFSLDKFJSLDKFJSLDKFJLK",
                "> jai_Seq64654
               HFSKDFHSKDHFAKHKASDHFKASDHFSKHFSKDFH
               KJSLKDJFLSKDJFSLDKFJSLKDFJSLKDFJLSKD
               KJSLDKJFSKLDFJSDKFJSLDKFJSLKDFJSLDKF
               KSJDFLSJDFKLSDJFSLDKFJSLDKFJSLDKFJLK"]

最后我可以循环“seq_list”并写下文件


Tags: 文件数据in标记txtline序列seq
1条回答
网友
1楼 · 发布于 2024-09-30 06:21:35
tf = None # the current target file

with open("seq.txt") as seq:
    for line in seq:
        if line.startswith(">"):
            # close the current target file
            if tf:
                tf.close()
            # open a new target file
            tf = open(line.split()[1], "w")
        else:
            tf.write(line)

运行此命令将创建以下文件:

jai_Seq14564564
jai_Seq146456456
jai_Seq14654564
jai_Seq1793857
jai_Seq1flkgh456456
jai_Seq1werwr456446
jai_Seq64654
jai_Seq8347628

每行包含四行

相关问题 更多 >

    热门问题