如何剪切部分文本并用Python和RegEx替换每一行

2024-10-02 18:20:56 发布

您现在位置:Python中文网/ 问答频道 /正文

你好,我是Python的初学者,刚刚开始学习Python并使用RegEx进行文本操作。 如果我违反了StackOverflow的一些规则,我会提前道歉

我正在用Python编写一个脚本,从第一行开始(剪切)日期和时间,并在每一行上替换“date”“TimeWindowStart”和“TimeWindowEnd”

ReportDate=03/24/2019, TimeWindowStart=18:00:00, TimeWindowEnd=20:59:59

Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000

我知道如何选择与正则表达式日期

([0-9][0-9]|2[0-9])/[0-9][0-9](/[0-9][0-9][0-9][0-9])?

如何选择时间

([0-9][0-9]|2[0-9]):[0-9][0-9](:[0-9][0-9])?

但是我一直在思考如何选择文本的一部分,复制它,然后找到我想用替换的文本回复sub功能

所以最终输出如下所示:

ReportDate=, TimeWindowStart=, TimeWindowEnd=

03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000 
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000 
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000 
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000

Tags: 文本report功能脚本date规则时间stackoverflow
3条回答

这是我的密码:

import re

s = """ReportDate=03/24/2019, TimeWindowStart=18:00:00, TimeWindowEnd=20:59:59

Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000"""

datereg = r'(\d{2}/\d{2}/\d{4})'
timereg = r'(\d{2}:\d{2}:\d{2})'

dates = re.findall(datereg, s)
times = re.findall(timereg, s)

# replacing one thing at a time
result = re.sub(r'\bDate\b', dates[0],
            re.sub(r'\bTimeWindowEnd\b,', times[1] + ',',
                re.sub(r'\bTimeWindowStart\b,', times[0] + ',',
                    re.sub(timereg, '', 
                        re.sub(datereg, '', s)))))

print(result)

输出:

ReportDate=, TimeWindowStart=, TimeWindowEnd=

03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000

这是一个部分的答案,因为我不知道pythonapi操作文本文件特别好。您可以读取文件的第一行,并提取报告日期和开始/结束窗口时间的值。你知道吗

first = "ReportDate=03/24/2019, TimeWindowStart=18:00:00, TimeWindowEnd=20:59:59"
ReportDate = re.sub(r'ReportDate=([^,]+),.*', '\\1', first)
TimeWindowStart = re.sub(r'.*TimeWindowStart=([^,]+),.*', '\\1', first)
TimeWindowEnd = re.sub(r'.*TimeWindowEnd=(.*)', '\\1', first)

写出第一行,去掉三个变量的值。你知道吗

然后,您只需读入后面的每一行,并进行以下替换:

line = "Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000"
line = re.sub(r'\bDate\b', ReportDate, line)
line = re.sub(r'\b TimeWindowStart\b', TimeWindowStart, line)
line = re.sub(r'\ TimeWindowEnd\b', TimeWindowEnd, line)

以这种方式处理每一行之后,可以将其写入输出文件。你知道吗

首先,您可以在regex查询中指定一个量词,因此如果您想要4个数字,您不需要[0-9][0-9][0-9][0-9],但可以使用[0-9]{4}。要捕获表达式,请将其包装在圆括号中value=([0-9]{4})将只提供数字

如果你想使用re.sub,你只需要给它一个模式,一个替换字符串和你的输入字符串,例如re.sub(pattern, replacement, string)

因此:

import re

txt = """ReportDate=03/24/2019, TimeWindowStart=18:00:00, TimeWindowEnd=20:59:59

Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
Date, TimeWindowStart, TimeWindowEnd, Report-20190323_210000
"""

pattern_date = 'ReportDate=([0-9]{2}/[0-9]{2}/[0-9]{4})'
report_date = re.findall(pattern_date, txt)[0]

pattern_time_start = 'TimeWindowStart=([0-9]{2}:[0-9]{2}:[0-9]{2})'
start_time = re.findall(pattern_time_start, txt)[0]

pattern_time_end = 'TimeWindowEnd=([0-9]{2}:[0-9]{2}:[0-9]{2})'
end_time = re.findall(pattern_time_end, txt)[0]

splitted = txt.split('\n')  # Split the txt so that we skip the first line

txt2 = '\n'.join(splitted[1:])  # text to perform the sub 

# substitution of your values
txt2 = re.sub('Date', report_date, txt2)
txt2 = re.sub('TimeWindowStart', start_time, txt2)
txt2 = re.sub('TimeWindowEnd', end_time, txt2)

txt_final = splitted[0] + '\n' + txt2
print(txt_final)

输出:

ReportDate=03/24/2019, TimeWindowStart=18:00:00, TimeWindowEnd=20:59:59

03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000
03/24/2019, 18:00:00, 20:59:59, Report-20190323_210000

相关问题 更多 >