有没有更好的方法来读取Python中的几个txt文件？

begin(model(tb4)). ... sequence_length(187). amino_acid_pair_ratio(a,a,24.8). amino_acid_pair_ratio(a,c,0.0). ... tb_to_tb_evalue(tb3671,1.100000e-01). tb_to_tb_evalue(tb405,4.300000e-01). tb_to_tb_evalue(tb3225,5.600000e-01). ... end(model(tb4)) begin(model(tb56)). ...... end(model(tb56))

def readorfs(): # Declaramos la ruta de la carpeta que almacena los ficheros path = "data/orfs" # Recogemos los nombres de los ficheros all_files = glob.glob(path + "/*.txt") # Leemos los ficheros line a linea for filename in all_files: with open(filename) as f: lines = f.readlines() # Lee el fichero line a linea for line in lines: if line.startswith("begin(model(") and (myarray[i]) in line: print(line)

2条回答

网友

1楼 · 编辑于 2024-10-02 18:28:19

以下是我的建议：首先创建一个字典，其中myarray的所有项都是键，值=0 然后，创建一个处理特定文件的函数。将整个文件作为文本加载，按“（开始（模型（”）将其拆分，并计算“tb_to_tb_evalue”的所有出现次数。将所有结果添加到字典中。最后对所有文件运行此函数。见下文：

d={i:0 for i in myarray}
def readorfs(file):
    t=open(file).read()
    l=t.split(sep='begin(model(')[1:]
    for i in l:
        s=i[:i.find(')')]
        if s in d:
            d[s]+=i.count('tb_to_tb_evalue')
        else:
            d[s]=i.count('tb_to_tb_evalue')

for filename in all_files:
    readorfs(filename)

您也可以在一个函数中运行所有文件，如下所示。在这种情况下，您必须在函数参数中插入myarray：

def readorfs(myarray):
    d={i:0 for i in myarray}
    path = "data/orfs"
    all_files = glob.glob(path + "/*.txt")
    for filename in all_files:
        t=open(file).read()
        l=t.split(sep='begin(model(')[1:]
        for i in l:
            s=i[:i.find(')')]
            if s in d:
                d[s]+=i.count('tb_to_tb_evalue')
            else:
                d[s]=i.count('tb_to_tb_evalue')
    return d

网友

2楼 · 编辑于 2024-10-02 18:28:19

这里有一个建议，使用来自collections的defaultdict将您的特定模型名称作为键，并将该模型的tb_to_tb_evalue行数作为值。因为您正在完全读取所有文件，所以查找所有模型的计数没有实际的额外开销。但最终从列表中获取特定型号的计数将是非常简单的

from collections import defaultdict
import re
tb_count = defaultdict(int)
# create regular expression to find the model name from the "begin(model(...))" lines
model_regex = re.compile(r"begin\(model\((.*)\)\)")
for file in all_filenames:
    model = None  # initiate model as None for each file, but value will be changed if begin(model( line is encountered
    with open(file) as f:
        for line in f:
            if line.startswith("begin(model("):
                # identify the model name
                match = model_regex.search(line)
                if match:
                    model = match.group(1)
            if line.startswith("tb_to_tb_evalue("):
                tb_count[model] += 1  # increase the count for the current active model

因此，它将遍历所有文件，但只遍历一次。最后，要从特定列表（例如myarray）中获取所有模型的计数，您可以编写如下内容：

models_of_interest = {k: v for k, v in tb_count.items() if k in myarray }

相关问题更多 >

编程相关推荐

热门问题

热门文章