python regex，在多行中匹配，但仍希望得到行号

string=""" ####1 ttteest ####1 ttttteeeestt ####2 ttest ####2 """ import re pattern = '.*?####(.*?)####' matches= re.compile(pattern, re.MULTILINE|re.DOTALL).findall(string) for item in matches: print "lineno: ?", "matched: ", item

3条回答

网友

1楼 · 编辑于 2024-09-27 01:25:42

您需要的是正则表达式不擅长的典型任务；解析。

您可以逐行读取日志文件，并在该行中搜索用于分隔搜索的字符串。您可以逐行使用regex，但它比常规字符串匹配效率低，除非您正在寻找复杂的模式。

如果你在寻找复杂的匹配，我想看看。在保持行数的同时搜索文件中的每一行以查找####在没有regex的情况下更容易。

网友

2楼 · 编辑于 2024-09-27 01:25:42

你可以把行号储存在手边，然后再找。

import re

string="""
####1
ttteest
####1
ttttteeeestt

####2

ttest
####2
"""

end='.*\n'
line=[]
for m in re.finditer(end, string):
    line.append(m.end())

pattern = '.*?####(.*?)####'
match=re.compile(pattern, re.MULTILINE|re.DOTALL)
for m in re.finditer(match, string):
    print 'lineno :%d, %s' %(next(i for i in range(len(line)) if line[i]>m.start(1)), m.group(1))

网友

3楼 · 编辑于 2024-09-27 01:25:42

这可以通过以下方式相当有效地完成：

查找所有匹配项
循环换行，将{offset: line_number}映射存储到最后一个匹配。
对于每一个匹配项，预先反向查找第一个换行符的偏移量，并在地图中查找它的行号。

这样可以避免每次匹配都倒数到文件的开头。

以下函数类似于re.finditer

def finditer_with_line_numbers(pattern, string, flags=0):
    '''
    A version of 're.finditer' that returns '(match, line_number)' pairs.
    '''
    import re

    matches = list(re.finditer(pattern, string, flags))
    if not matches:
        return []

    end = matches[-1].start()
    # -1 so a failed 'rfind' maps to the first line.
    newline_table = {-1: 0}
    for i, m in enumerate(re.finditer(r'\n', string), 1):
        # don't find newlines past our last match
        offset = m.start()
        if offset > end:
            break
        newline_table[offset] = i

    # Failing to find the newline is OK, -1 maps to 0.
    for m in matches:
        newline_offset = string.rfind('\n', 0, m.start())
        line_number = newline_table[newline_offset]
        yield (m, line_number)

如果需要内容，可以将最后一个循环替换为：

    for m in matches:
        newline_offset = string.rfind('\n', 0, m.start())
        newline_end = string.find('\n', m.end())  # '-1' gracefully uses the end.
        line = string[newline_offset + 1:newline_end]
        line_number = newline_table[newline_offset]
        yield (m, line_number, line)

请注意，最好避免从finditer创建列表，但这意味着我们不知道何时停止存储新行（即使只有模式匹配在文件的开头，它也可能最终存储许多新行）。

如果避免存储所有匹配项是很重要的，那么可以根据需要生成一个扫描换行符的迭代器，尽管不确定这会在实践中给您带来多大优势。

相关问题更多 >

编程相关推荐

热门问题

热门文章