使用来自regex的findall和sub函数来搜索和替换确切的字符串

2024-09-30 01:31:17 发布

您现在位置:Python中文网/ 问答频道 /正文

基于这个论坛,Replacing a line in a file based on a keyword search, by line from another file我对我的真实文件没有什么困难。如下图所示,我想搜索一个关键字“PBUSH后跟数字(持续增加)”,并基于该关键字在另一个文件中搜索,如果它是否存在。如果存在,则将“PBUSH number K Some decimals”行中的数据替换为另一个文件中找到的行,保持搜索关键字不变。它将一直运行到文件末尾,看起来像enter image description here

我修改的代码(注意findall和sub格式)如下所示:

import re
path1 = "C:\Users\sony\Desktop\PBUSH1.BDF"
path2 = "C:\Users\sony\Desktop\PBUSH2.BDF"

with open(path1) as f1, open(path2) as f2:
    dat1 = f1.read()
    dat2 = f2.read()

    matches = re.findall('^PBUSH \s [0-9] \s K [0-9 ]+', dat1, flags=re.MULTILINE)
    for match in matches:
        dat2 = re.sub('^{} \s [0-9] \s K \s'.format(match.split(' ')[0]), match, dat2, flags=re.MULTILINE)

with open(path2, 'w') as f:
    f.write(dat2)

这里我的搜索关键字是PBUSH spaces Number,然后剩下的关键字如PBUSH行所示。我不能让它工作。可能是什么原因!在


Tags: 文件inreasmatchline关键字open
1条回答
网友
1楼 · 发布于 2024-09-30 01:31:17

在这种情况下最好使用组,并将整个字符串分成两部分,一个用于匹配短语,另一个用于数据。在

import re
# must use raw strings for paths, otherwise we need to
# escape \ characters
input1 = r"C:\Users\sony\Desktop\PBUSH1.BDF" 
input2 = r"C:\Users\sony\Desktop\PBUSH2.BDF"

with open(input1) as f1, open(input2) as f2:
    dat1 = f1.read()
    dat2 = f2.read()

# use finditer instead of findall so that we will get 
# a match object for each match.
# For each matching line we also have one subgroup, containing the
# "PBUSH   NNN     " part, whereas the whole regex matches until
# the next end of line
matches = re.finditer('^(PBUSH\s+[0-9]+\s+).*$', dat1, flags=re.MULTILINE)

for match in matches:
    # for each match we construct a regex that looks like
    # "^PBUSH   123      .*$", then replace all matches thereof
    # with the contents of the whole line
    dat2 = re.sub('^{}.*$'.format(match.group(1)), match.group(0), dat2, flags=re.MULTILINE)

with open(input2, 'w') as outf:
    outf.write(dat2)

相关问题 更多 >

    热门问题