如何使用python有条件地从txt文件中删除行序列

NAME: C11H11NO5; PlaSMA ID-967 PRECURSORMZ: 238.0712 PRECURSORTYPE: [M+H]+ FORMULA: C11H11NO5 Ontology: Formula predicted INCHIKEY: SMILES: RETENTIONTIME: 1.74 CCS: -1 IONMODE: Positive COLLISIONENERGY: Comment: Annotation level-3; PlaSMA ID-967; ID title-AC_Bulb_Pos-629; Max plant tissue-LE_Ripe_Pos Num Peaks: 2 192.06602 53 238.0757 31 NAME: Malvidin-3,5-di-O-glucoside; PlaSMA ID-3141 PRECURSORMZ: 656.19415 PRECURSORTYPE: [M+H]+ FORMULA: C29H35O17 Ontology: Anthocyanidin O-glycosides INCHIKEY: CILLXFBAACIQNS-UHFFFAOYNA-O SMILES: COC1=CC(=CC(OC)=C1O)C1=C(OC2OC(CO)C(O)C(O)C2O)C=C2C(OC3OC(CO)C(O)C(O)C3O)=CC(O)=CC2=[O+]1 RETENTIONTIME: 2.81 CCS: 241.3010517 IONMODE: Positive COLLISIONENERGY: Comment: Annotation level-1; PlaSMA ID-3141; ID title-Malvidin-3,5-di-O-glucoside; Max plant tissue-Standard only Num Peaks: 0

lines[indices[diff14[0]]: indices[diff14[1]]] lines[indices[diff14[1]+1] : indices[diff14[2]]] lines[indices[diff14[2]+1] : lines[indices[diff14[3]]]] lines[indices[diff14[3]+1] : indices[diff14[4]]]

3条回答

网友

1楼 · 编辑于 2024-06-28 11:33:58

下面是一种处理文件的相当简单的方法

打开数据文件并遍历其行，将它们存储在列表（缓存）中。如果一行以NAME:开头，则该行是新记录的开头，如果缓存不是空的，则可以打印缓存

如果该行以Num Peaks:开头，则检查该值。如果为零，则缓存被清空，导致此记录被遗忘

跳过仅包含空格的行

with open('data') as f:
    line_cache = []
    for line in f:
        if line.startswith('NAME:'):
            if line_cache:
                print(*line_cache, sep='')
                line_cache = []
        elif line.startswith('Num Peaks:'):
            num_peaks = int(line.partition(': ')[2])
            if num_peaks == 0:
                line_cache = []
                continue

        if line.strip():        # filter empty lines
            line_cache.append(line)

    if line_cache:    # don't forget the last record
        print(*line_cache, sep='', end='')

输出到标准输出。它可以重定向到shell环境中的文件中。如果要直接写入文件，可以在开始时打开它并修改print()语句：

with open('output', 'w') as output, open('data') as f:
    ...

并将print()更改为

print(*line_cache, sep='', file=output)

网友

2楼 · 编辑于 2024-06-28 11:33:58

# Open / read tmp file created with the text you supplied
filedat = open('tmpWrt.txt','r')
filelines = filedat.readlines()

# Open output file object
file_out = open('tmp_out.txt','w')

line_count = 0

# Iterate through all file lines
for line in filelines:
    # If line is beginning of section
    # reset tmp variables
    if line != "\n" and line.split()[0] == "NAME:":
        tmp_lines = []
        flag = 'n'

    tmp_lines.append(line)
    line_count += 1

    # If line is the end of a section and peaks > 0
    # write to file
    if (line == "\n" or line_count == len(filelines)) and flag == 'y':
        #tmp_lines.append("\n")
        for tmp_line in tmp_lines:
            file_out.write(tmp_line)

    # If peaks > 0 set flag to "y"
    if line != "\n" and line.split()[0] == "Num":
            if int(line.split()[2]) != 0:
                flag = "y"

file_out.close()

网友

3楼 · 编辑于 2024-06-28 11:33:58

这并不像其他答案那样紧凑和高效，但希望它更容易理解和扩展

我建议的方法是将您的输入解析为列表列表，每个元素包含一个化合物。我建议三个步骤：（1）将数据解析为化合物列表，（2）迭代此化合物列表，删除您不需要的化合物，（3）将列表输出回文件。根据文件的大小，可以在数据上使用1个循环，也可以使用3个单独的循环

# Step (1) Parse the file
compounds = list() # store all compunds
with open('compound.txt', 'r') as f:
    # stores a single compound as a list of rows for a given compound.
    # Note: can be improved to e.g. a dictionary or a custom class
    current_compound = list()
    for line in f:
        if line.strip() == '': # assumes each compound is split by empty line(s)
            print('Empty line')
            # Store previous compound
            if len(current_compound) != 0:
                compounds.append(list(current_compound))

            # prepare for next compound
            current_compound = list()
        else:
            # At this point we could parse this more,
            # e.g. seperate into key/value, but lets just append the whole line with trailing newline
            print('Adding', line.strip())
            current_compound.append(line)

好的，现在让我们检查一下进展情况

for item in compounds:
    print('\n===Compound===\n', item)

导致

===Compound===
 ['NAME: C11H11NO5; PlaSMA ID-967\n', 'PRECURSORMZ: 238.0712\n', 'PRECURSORTYPE: [M+H]+\n', 'FORMULA: C11H11NO5\n', 'Ontology: Formula predicted\n', 'INCHIKEY:\n', 'SMILES:\n'\
, 'RETENTIONTIME: 1.74\n', 'CCS: -1\n', 'IONMODE: Positive\n', 'COLLISIONENERGY:\n', 'Comment: Annotation level-3; PlaSMA ID-967; ID title-AC_Bulb_Pos-629; Max plant tissue-LE\
_Ripe_Pos\n', 'Num Peaks: 2\n', '192.06602   53\n', '238.0757    31\n']

===Compound===
 ['NAME: Malvidin-3,5-di-O-glucoside; PlaSMA ID-3141\n', 'PRECURSORMZ: 656.19415\n', 'PRECURSORTYPE: [M+H]+\n', 'FORMULA: C29H35O17\n', 'Ontology: Anthocyanidin O-glycosides\n\
', 'INCHIKEY: CILLXFBAACIQNS-UHFFFAOYNA-O\n', 'SMILES: COC1=CC(=CC(OC)=C1O)C1=C(OC2OC(CO)C(O)C(O)C2O)C=C2C(OC3OC(CO)C(O)C(O)C3O)=CC(O)=CC2=[O+]1\n', 'RETENTIONTIME: 2.81\n', '\
CCS: 241.3010517\n', 'IONMODE: Positive\n', 'COLLISIONENERGY:\n', 'Comment: Annotation level-1; PlaSMA ID-3141; ID title-Malvidin-3,5-di-O-glucoside; Max plant tissue-Standard\
 only\n', 'Num Peaks: 0\n']

然后，您可以遍历此复合物列表，并在写回文件之前删除Num Peaks设置为0的复合物。如果您在这方面也需要帮助，请告诉我

相关问题更多 >

编程相关推荐

热门问题

热门文章