Python正则表达式搜索并替换字符串后的所有字符

2024-06-26 15:01:49 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试做一些我认为很简单的事情,但是我在正则表达式方面遇到了一些麻烦。 具体地说,我想在同一行中找到CAUGHT AN ERROR及其后面的所有内容,并用CAUGHT AN ERROR: XXXXX替换它。。我的理解是,使用.*$example)将允许我搜索到行尾,但使用for循环无法得到准确的替换。如何替换搜索的字符后的所有内容

1970-01-01 10:59:02     
1970-01-01 10:59:02    
1970-01-01 10:59:01    CAUGHT AN ERROR: rmv: cannot remove '/media/^Red^XXXXXX.jpg': No such file or directory; FROM: exec rm [file join $drive $newFile] (in USB::Write /)
1970-01-01 10:59:01    CAUGHT AN ERROR: rmv: cannot remove '/media/^Green^XXXXXX.jpg': No such file or directory; FROM: exec rm [file join $drive $newFile] (in USB::Write /media/ug)
1970-01-01 10:59:02    CAUGHT AN ERROR: rmv: cannot remove '/media/^Blue^XXXXXX.jpg': No such file or directory; FROM: exec rm [file join $drive $newFile] (in USB::Write /medi0349223^BradbuXXXXXX.jpg)
1970-01-01 10:59:02    CAUGHT AN ERROR: rmv: cannot remove '/media/^XXXXXX.jpg': No such file or directory; FROM: exec rm [file join $drive $newFile] (in USB::Write /media/usb0 XXXXXX.jpg)
1970-01-01 10:59:02    CAUGHT AN ERROR: rmv: cannot remove '/media/^Orange^XXXXXX.jpg': No such file or directory; FROM: exec rm [file join $drive $newFile] (in USB::Write )
1970-01-01 10:59:02  

我将上述示例日志保存在一个文件中,然后执行以下代码:

with open(r'C:\Users\Downloads\LOG\sample.log', mode='r', encoding='utf8') as log_r:
    content = log_r.read()

dict_items = {r'CAUGHT AN ERROR: [A-Z|a-z|0-9|\.|\-|\,|\_|\{|\}|\)|\(|\/]*\+': r'CAUGHT AN ERROR: XXXXXX'}

for k, v in dict_items.items():
    content = re.sub(k, v, content)

print(content)

在我的字典里,我也试过,但没有用

r'CAUGHT AN ERROR: .\$'
r'CAUGHT AN ERROR: .*$'

预期结果

1970-01-01 10:59:02     
1970-01-01 10:59:02    
1970-01-01 10:59:01    CAUGHT AN ERROR: XXXXXX
1970-01-01 10:59:01    CAUGHT AN ERROR: XXXXXX
1970-01-01 10:59:02    CAUGHT AN ERROR: XXXXXX
1970-01-01 10:59:02    CAUGHT AN ERROR: XXXXXX
1970-01-01 10:59:02    CAUGHT AN ERROR: XXXXXX
1970-01-01 10:59:02  

Tags: ornoinanerrormediadirectoryremove
1条回答
网友
1楼 · 发布于 2024-06-26 15:01:49

r'CAUGHT AN ERROR: .*$'是正确的regexp。但是您需要使用re.MULTILINE标志,以便$匹配行的结尾,而不是整个字符串的结尾

dict_items = {r'CAUGHT AN ERROR: .*$': r'CAUGHT AN ERROR: XXXXXX'}
for k, v in dict_items.items():
    content = re.sub(k, v, content, flags=re.MULTILINE)

print(content)

DEMO

相关问题 更多 >