当输入文件有多个分隔符时,如何在python中创建列表?

2024-05-08 12:50:42 发布

您现在位置:Python中文网/ 问答频道 /正文

示例文件如下所示:

 ['>1\n', 'TCCGGGGGTATC\n', '>2\n', 'TCCGTGGGTATC\n',
  '>3\n', 'TCCGTGGGTATC\n', '>4\n', 'TCCGGGGGTATC\n',
  '>5\n', 'TCCGTGGGTATC\n', '>6\n', 'TCCGTGGGTATC\n',
  '>7\n', 'TCCGTGGGTATC\n', '>8\n', 'TCCGGGGGTATC\n','\n',
  '$$$\n', '\n',
  '>B1\n', 'ATCGGGGGTATT\n', '>B2\n', 'TT-GTGGGAATC\n',
  '>3\n', 'TTCGTGGGAATC\n', '>B4\n', 'TT-GTGGGTATC\n',
  '>B5\n', 'TTCGTGGGTATT\n', '>B6\n','TTCGGGGGTATC\n',
  '>B7\n', 'TT-GTGGGTATC\n', '>B8\n', 'TTCGGGGGAATC\n',
  '>B9\n', 'TTCGGGGGTATC\n','>B10\n', 'TTCGGGGGTATC\n',
  '>B42\n', 'TT-GTGGGTATC\n']

$$$将两个集合分开。我需要使用.strip函数并删除\n和所有的“headers”

我需要列一个清单(如下)并用Z替换“-”

  [ 'TCCGGGGGTATC','TCCGTGGGTATC','TCCGTGGGTATC', 'TCCGGGGGTATC',
    'TCCGTGGGTATC','TCCGTGGGTATC','TCCGTGGGTATC', 'TCCGGGGGTATC',
    'ATCGGGGGTATT','TT-GTGGGAATC','TTCGTGGGAATC', 'TT-GTGGGTATC',
    'TTCGTGGGTATT','TTCGGGGGTATC','TT-GTGGGTATC', 'TTCGGGGGAATC',
    'TTCGGGGGTATC','TTCGGGGGTATC','TT-GTGGGTATC']

下面是一个代码(https://stackoverflow.com/a/39965048/6820344)的链接,其中回答了一个类似的问题。我试图修改代码以获得上面提到的输出。但是,如果没有“$$”,我就无法获得列表。还有,我需要一张单子,而不是一张单子

seq_list = []
for x in lst:
    if x.startswith('>'):
        seq_list.append([])
        continue
    x = x.strip()
    if x:
        seq_list[-1].append(x.replace("-", "Z"))
print(seq_list)

Tags: 代码seqlist单子striptttccgtgggtatcttcgggggaatc
1条回答
网友
1楼 · 发布于 2024-05-08 12:50:42
input = ['>1\n', 'TCCGGGGGTATC\n', '>2\n', 'TCCGTGGGTATC\n',
        '>3\n', 'TCCGTGGGTATC\n', '>4\n', 'TCCGGGGGTATC\n',
        '>5\n', 'TCCGTGGGTATC\n', '>6\n', 'TCCGTGGGTATC\n',
        '>7\n', 'TCCGTGGGTATC\n', '>8\n', 'TCCGGGGGTATC\n', '\n',
        '$$$\n', '\n',
        '>B1\n', 'ATCGGGGGTATT\n', '>B2\n', 'TT-GTGGGAATC\n',
        '>3\n', 'TTCGTGGGAATC\n', '>B4\n', 'TT-GTGGGTATC\n',
        '>B5\n', 'TTCGTGGGTATT\n', '>B6\n', 'TTCGGGGGTATC\n',
        '>B7\n', 'TT-GTGGGTATC\n', '>B8\n', 'TTCGGGGGAATC\n',
        '>B9\n', 'TTCGGGGGTATC\n', '>B10\n', 'TTCGGGGGTATC\n',
        '>B42\n', 'TT-GTGGGTATC\n']

output = []

for elem in input:
    if elem.startswith('>') or \
       elem.startswith('$') or \
       elem.isspace():
         continue

    output.append(elem.replace('-', 'Z').strip())

from pprint import pprint
pprint(output, compact=True)

运行前面的代码时,结果如下:

['TCCGGGGGTATC', 'TCCGTGGGTATC', 'TCCGTGGGTATC', 'TCCGGGGGTATC', 'TCCGTGGGTATC',
 'TCCGTGGGTATC', 'TCCGTGGGTATC', 'TCCGGGGGTATC', 'ATCGGGGGTATT', 'TTZGTGGGAATC',
 'TTCGTGGGAATC', 'TTZGTGGGTATC', 'TTCGTGGGTATT', 'TTCGGGGGTATC', 'TTZGTGGGTATC',
 'TTCGGGGGAATC', 'TTCGGGGGTATC', 'TTCGGGGGTATC', 'TTZGTGGGTATC']

相关问题 更多 >