file1.txt包含以下行:
[SUM] 0.00-34.53 sec 2.11 GBytes 524 Mbits/sec sender
[SUM] 0.00-34.53 sec 2.11 GBytes 524 Mbits/sec sender
[SUM] 0.00-34.62 sec 2.36 GBytes 586 Mbits/sec sender
[SUM] 0.00-34.62 sec 2.36 GBytes 586 Mbits/sec sender
[SUM] 0.00-34.75 sec 2.39 GBytes 591 Mbits/sec receiver
[SUM] 0.00-34.75 sec 2.39 GBytes 591 Mbits/sec receiver
[0] 0.00-34.53 sec 0.00 Bytes 0.00 bits/sec receiver
[0] 0.00-34.75 sec 0.00 Bytes 0.00 bits/sec sender
将以[SUM]开头、以sender和receiver结尾的行打印到另一个文本文件-file2.txt中
代码如下:
with open(r"C:\Users\file1.txt", 'r') as f:
contents = f.read()
s=contents
def my_function1():
regex = "^\s*\[SUM\]\s*[0-9\-\.]+\s+sec(?!\s+0\.00 Bytes).*sender.*"
items=re.findall(regex,s,re.MULTILINE)
for y in items:
file=open('file2.txt', "a")
file.write(str(y))
file.write("\n")
file.close()
def my_function2():
regex = "^\s*\[SUM\]\s*[0-9\-\.]+\s+sec(?!\s+0\.00 Bytes).*receiver.*"
items=re.findall(regex,s,re.MULTILINE)
for y in items:
file=open('file2.txt', "a")
file.write(str(y))
file.write("\n")
file.close()
#print(y)
my_function1()
my_function2()
将输出写入file2.txt,如下所示:
[SUM] 0.00-34.53 sec 2.11 GBytes 524 Mbits/sec sender
[SUM] 0.00-34.53 sec 2.11 GBytes 524 Mbits/sec sender
[SUM] 0.00-34.62 sec 2.36 GBytes 586 Mbits/sec sender
[SUM] 0.00-34.62 sec 2.36 GBytes 586 Mbits/sec sender
[SUM] 0.00-34.62 sec 2.36 GBytes 586 Mbits/sec sender
[SUM] 0.00-34.75 sec 2.39 GBytes 591 Mbits/sec receiver
[SUM] 0.00-34.75 sec 2.39 GBytes 591 Mbits/sec receiver
应为:仅打印一次
[SUM] 0.00-34.53 sec 2.11 GBytes 524 Mbits/sec sender
[SUM] 0.00-34.62 sec 2.36 GBytes 586 Mbits/sec sender
[SUM] 0.00-34.75 sec 2.39 GBytes 591 Mbits/sec receiver
此处不需要re模块,也不必加载内存中的所有内容:
如果搜索实际上更复杂并且需要
re
模块,我仍然坚持一次处理一行,并在循环之外编译正则表达式:如果需要搜索2种模式,并确保第一种模式的匹配在第二种模式的匹配之前写入,则可以使用:
它给出了预期的结果:
如果要获得唯一列表,只需添加:
list(set(items))
在写入文件之前只需使用awk:
正如您所看到的,您不需要像您发布的示例输入那样复杂的regexp,但是如果您这样做了,那么您可能需要这样的东西(使用GNU awk表示
\s
,其他awk使用[[:space:]]
):相关问题 更多 >
编程相关推荐