如何打印出比特定长度长的行

2024-09-19 23:30:47 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这样一个输入文件:

@sample1
ATGGTTCCAAGGCCTTGGTTAATTGGGGGGTTTTTTTTTTTTTTTTTTT

@sample2
TTGGAACCTTGGCCAATTAAGGGGGGGGGTTTTTTTCCCCCCCCCCCCC

@sample3
GGTTGGTTGGGAATTTGGTTAACCTTTTTAAATTTTTTTTTTTGGGGGG
AATTTTTTTTTTTTTGG

我想打印出具有特定最小长度的行。例如,如果我想要的最小长度是66,那么输出将是:

@sample3
GGTTGGTTGGGAATTTGGTTAACCTTTTTAAATTTTTTTTTTTGGGGGG
AATTTTTTTTTTTTTGG

因为只有样本3的序列具有最小长度66

下面是我的代码:

    fastfile = {}
    with open(sys.argv[1]) as f:
        for line in f:
            line = line.strip()
            if not line:
                continue
            if line.startswith("@"):
                sequencenumber = line[1:]
                if sequencenumber not in fastfile:
                    fastfile[sequencenumber] = []
                continue
            sequence = line
            fastfile[sequencenumber].append(sequence)

            output = []
            for key, value in fastfile.items():
                if len(value) >= sys.argv[2]:
                    output.append(value)
                    print (output)

Argv[1]是输入文件的路径,Argv[2]是特定的最小长度。你知道吗


Tags: 文件inforoutputifvaluesysline
2条回答

您希望fastfile字典的值是字符串而不是列表,因此不需要将连续序列附加到正在运行的列表,而是需要将它们连接到正在运行的字符串:

fastfile = {}
with open(sys.argv[1]) as f:
    for line in f:
        line = line.strip()
        if not line:
            continue
        if line[0] == "@":
            sequencenumber = line[1:]
            if sequencenumber not in fastfile:
                fastfile[sequencenumber] = ""
            continue
        fastfile[sequencenumber] += line

output = []
for key, value in fastfile.items():
    if len(value) >= sys.argv[2]:
        output.append(value)
print (output)

或者,如果您需要像最初那样将字符串存储在列表中,则使用"".join(value)将所有字符串连接在一起,如下所示:

output = []
for key, value in fastfile.items():
    if len("".join(value)) >= sys.argv[2]:
        output.append("".join(value))
output

这看起来简单得多:

with open(argv[1]) as fin :
    text = fin.read()

min_length = int(argv[2])

parts = text.split('@')
# choose only the parts that have strings over the min_length
parts = [p for p in parts if any(len(i) > min_length for i in p.split('\n'))]

output = '@'.join( parts )

相关问题 更多 >