Python：如何在使用正则表达式时跳过包含额外字符的行？

[random lines of text] DATE/USER: 07/01/15 string1 [random lines of text] DATE/USER: 07/12/15 string2 [random lines of text] DATE/USER: 07/04/15 string3 [random lines of text] DATE/USER: 07/12/15 string4 [random lines of text] DATE/USER: 07/05/15 string5 * blah1 * [random lines of text] DATE/USER: 07/02/15 string6 [random lines of text] DATE/USER: 07/08/15 string7 [random lines of text] DATE/USER: 07/11/15 string8 * blah2 * [random lines of text] DATE/USER: 07/03/15 string9 [random lines of text] DATE/USER: 07/10/15 string10 * blah3 * [random lines of text]

3条回答

网友

1楼 · 编辑于 2024-09-26 17:59:37

re.findall('DATE/USER: \d\d/\d\d/\d\d\s+([A-Z])', line)

网友

2楼 · 编辑于 2024-09-26 17:59:37

根据您的编辑，您完全可以拆分：

with open("in.txt") as f:
    for line in f:
        if line.startswith("DATE/USER:"):
            spl = line.split()
            if len(spl) == 3:
                print(spl[2])

输出：

^{pr2}$

使用re：

with open("in.txt") as f:
    import re
    r = re.compile(r'(^DATE/USER:\s+\d+/\d+/\d+\s+(\w+$))')
    for line in f:
        match = r.search(line)
        if match:
           print(match.group(2))

输出：

^{pr2}$

网友

3楼 · 编辑于 2024-09-26 17:59:37

下面的“$”实际上将排除后面有*blah*的任何行：

rphfind = re.findall('(?<=DATE/USER: \d\d/\d\d/\d\d)\s+([A-Z])$', line)

只匹配A，B，C，D，F，G，I

capture组（[A-Z]）将只获取单个大写字母，但仍允许任何行匹配（在示例中打印A到J）

^{pr2}$

不知道你在找哪个版本

相关问题更多 >

编程相关推荐

热门问题

热门文章