删除python中的特定字符串和特定空行

2024-09-25 14:24:29 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在删除文本文件中的特定字符串和空行,下面是我前面的问题。。。我参考了一些例子和我们的专家的解决方案。。。它可以很好地去除字符串而不是空行。为了便于理解,我在这里强调了这个问题。你知道吗

文本文件的某些部分包含stringA、stringB和stringC行,并且在其下面还有空行,只删除其下面的一行。你知道吗

line0
line1      stringAxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line2                stringBxxxxxxxxxxxxxxxxxxxxxxx
line3        stringCxxxxxxxxxxxxxxxxxxx 
line4
line5
line6  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line7  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line8  
line9  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line10 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line11               stringBxxxxxxxxxxxxxxxxxxxxxxx
line12       stringCxxxxxxxxxxxxxxxxxxx  
line13
line14
line15  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line16  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line17 
line18  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line19  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line20
line21  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line22  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line23 
line24  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line25  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line26               stringBxxxxxxxxxxxxxxxxxxxxxxx
line27       stringCxxxxxxxxxxxxxxxxxxx  
line28
line29
line30  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line31  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line32  

在本例中,删除包含任何stringA、stringB、stringC和其后一行的任何行。例如,删除第1、2、3、4行删除第11、12、13行删除第26、27、28行

我尝试过使用strip(),但它删除了所有空行。这是我使用的脚本,它确实删除了包含stringA、stringB和stringC的所有行。你知道吗

filename = 'raw.txt'
with open(filename, 'r') as fin:
    lines = fin.readlines()
with open('clean.txt', 'w') as fout:
   for line in lines:
        if not re.match(r"\s+(stringA|stringB|stringC)", line):
            fout.write(line)

预期产量

line0
line5
line6  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line7  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line8  
line9  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line10 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line14
line15  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line16  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line17 
line18  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line19  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line20
line21  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line22  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line23 
line24  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line25  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line29
line30  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line31  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line32  

感谢你的帮助和好心的帮助。非常感谢。你知道吗


Tags: 字符串line文本文件空行stringaline6line5stringc
2条回答

我很确定这不是最好的答案,但“类似旗帜”的方法很有效:

import re
filename = 'raw.txt'
with open(filename, 'r') as fin:
    lines = fin.readlines()

flag = 0

with open('clean.txt', 'w') as fout:
    for line in lines:
        if not re.match(r'.*(stringA|stringB|stringC)', line):
            if not flag:
                fout.write(line)
            flag = 0
        else:
            flag = 1

希望有帮助

优化解决方案:

with open('raw.txt', 'r') as fin, open('clean.txt', 'w') as fout:
    string_c_pat = re.compile(r'\s+stringC')
    pat = re.compile(r"\s+(stringA|stringB|stringC)")

    for line in fin:    # traversing file as iterator 
        if string_c_pat.match(line):
            next(fin)   # skip `stringC` line and jump to next line
        if not pat.match(line):
            fout.write(line)

using re.compile() and saving the resulting regular expression object for reuse is more efficient when the expression will be used several times in a single program.

相关问题 更多 >