删除以唯一数字开头的行

0 Kurthia sibirica Planococcaceae 1593 Lactobacillus hordei Lactobacillaceae 1121 Lactobacillus coleohominis Lactobacillaceae 614 Lactobacillus coryniformis Lactobacillaceae 57 Lactobacillus kitasatonis Lactobacillaceae 3909 Lactobacillus malefermentans Lactobacillaceae

#!/usr/bin/env python infilename = 'v35.clusternum.species.txt' outfilename = 'v13clusters.no.singletons.txt' #remove extra letters and spaces x = 0 with open(infilename, 'r') as infile, open(outfilename, 'w') as outfile: for line in infile: clu, gen, spec, fam = line.split() for clu in line: if clu.count > 1: #print line outfile.write(line) else: x += 1 print("Number of Singletons:") print(x)

1条回答

网友

1楼 · 发布于 2024-09-28 05:27:33

好吧，你的代码是朝着正确的方向发展的，但是你有一些东西被弄糊涂了。你知道吗

您需要将脚本所做的工作分为两个逻辑步骤：第一步，聚合（计数）所有clu字段。第二，写入clu计数大于1的每个字段。你试着同时做这些步骤。。嗯，没用。从技术上讲，你可以这样做，但你的语法是错误的。不断地在文件中搜索内容也是非常低效的。最好只做一两次。你知道吗

所以，让我们分开步骤。首先，计算你的clu字段。collections模块有一个可以使用的Counter。你知道吗

from collections import Counter
with open(infilename, 'r') as infile:
    c = Counter(line.split()[0] for line in infile)

c现在是一个Counter，可以用来查找给定clu的计数。你知道吗

with open(infilename, 'r') as infile, open(outfilename, 'w') as outfile:
        for line in infile:
                clu, gen, spec, fam = line.split()
                if c[clu] > 1:
                    outfile.write(line)

相关问题更多 >

编程相关推荐

热门问题

热门文章