Python正则表达式如何用相同的字符替换匹配中的每个字符(无函数)

2024-09-29 02:28:41 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图从txt文件中动态读取正则表达式匹配规则和替换规则,并将其作为参数发送到re.sub函数

我的输入是"PERSONALDETAILS TESTY MCTESTER 123 TEST DRIVE..."-我需要替换的数据长度是常量(100)

我的输出应该是"PERSONALDETAILS *****************************..."

我拥有的示例文件如下:

"rules": 
{
        "(?<=PERSONALDETAILS).{1,100}": "####################################################################################################"
}

我上面的替换值硬编码为100个字符。有没有一种不使用lambda函数的优雅方法


Tags: 文件数据函数testretxt示例参数
2条回答

这就是你想做的吗?如果是这样,您可能不需要正则表达式

txt = '''PERSONALDETAILS TESTY MCTESTER 123 TEST DRIVE...
PERSONALDETAILS THIS IS THE SECOND LINE IN THE TXT FILE...
PERSONALDETAILS THIS IS THE THIRD LINE IN THE SAMPLE TXT FILE...
PERSONALDETAILS THE FOURTH LINE IN TXT FILE...
THIS LINE DOES NOT STARTWITH PERSONALDETAILS...
PERSONAL DETAILS THIS LINE STARTS BUT HAS A SAPCE INBETWEEN...
PERSONALDETAILS LINE SEVEN OF TXT FILE...'''

replaced_txt = ''
print (txt)

for x in txt.split('\n'):
    if x.startswith('PERSONALDETAILS'):
        replaced_txt += ''.join([x[:16],'*' * (len(x) - 18),x[-3:],'\n'])
    else:
        replaced_txt += x + '\n'

print (replaced_txt)

其输出将为:

原始文件:

PERSONALDETAILS TESTY MCTESTER 123 TEST DRIVE...
PERSONALDETAILS THIS IS THE SECOND LINE IN THE TXT FILE...
PERSONALDETAILS THIS IS THE THIRD LINE IN THE SAMPLE TXT FILE...
PERSONALDETAILS THE FOURTH LINE IN TXT FILE...
THIS LINE DOES NOT STARTWITH PERSONALDETAILS...
PERSONAL DETAILS THIS LINE STARTS BUT HAS A SAPCE INBETWEEN...
PERSONALDETAILS LINE SEVEN OF TXT FILE...

修改文件:

PERSONALDETAILS ******************************...
PERSONALDETAILS ****************************************...
PERSONALDETAILS **********************************************...
PERSONALDETAILS ****************************...
THIS LINE DOES NOT STARTWITH PERSONALDETAILS...
PERSONAL DETAILS THIS LINE STARTS BUT HAS A SAPCE INBETWEEN...
PERSONALDETAILS ***********************...

您可以保存规则并将其作为json数据类型读取

  1. 保留你的规则

  2. 将占位符设置为规则的目标长度,而不是硬编码的n个字符。您可以使用{}.format()属性来完成此项工作,如下所示:

    (?<=PERSONALDETAILS).{{{start}, {end}}}

  3. 将规则另存为json文件

  4. 将规则作为json文件读取

  5. 迭代规则(键和值)并替换它们的值

试试这个:

import re
text = "PERSONALDETAILS TESTY MCTESTER 123 TEST DRIVE..."
rules = {"(?<=PERSONALDETAILS).{{{start},{end}}}": "####################################################################################################"}

for each_pattern, each_replacement in rules.items():
    text = re.sub(each_pattern.format(start=1, end=100), each_replacement, text)

print(text)

从json文件读取规则:

import json 

with open("rules.json") as json_file:
    rules = json.load(json_file)["rules"]

完整代码:

import re
from json import load, dump


rules = {
    "(?<=PERSONALDETAILS_1).{{{start},{end}}}": "#####################################################",
    "(?<=PERSONALDETAILS_2).{{{start},{end}}}": "*****************************************************",
    "(?<=PERSONALDETAILS_3).{{{start},{end}}}": "=====================================================",
}

with open('rules.json', 'w') as json_file:
    dump(rules, json_file)

with open("rules.json") as json_file:
    rules = load(json_file)["rules"]


target_text = "PERSONALDETAILS TESTY MCTESTER 123 TEST DRIVE..."
for each_pattern, each_replacement in rules.items():
    text = re.sub(each_pattern.format(start=1, end=100), each_replacement, target_text)

print(target_text)

对这个问题的一些赞扬how to get around "Single '}' encountered in format string" when using .format and formatting in printing@fred-fro

相关问题 更多 >