<p><a href="https://stackoverflow.com/questions/34045227/python-equivalent-for-grep-c-n#comment55844990_34045227">As recommended</a>通过<a href="https://stackoverflow.com/users/20862/ignacio-vazquez-abrams">Ignacio Vazquez-Abrams</a>,使用<a href="https://docs.python.org/2/library/collections.html#collections.deque" rel="nofollow noreferrer">a deque</a>来存储最后的<em>n</em>行。一旦有许多行出现,popleft为每个新行添加。当正则表达式找到匹配项时,返回堆栈中以前的<em>n</em>行,然后迭代<em>n</em>更多行并同时返回这些行。在</p>
<p>这使您不必在任何行上迭代两次(干),并且只在内存中存储最小的数据。您还提到了对Unicode的需要,因此处理文件编码并向RegEx搜索添加Unicode标志非常重要。另外,另一个答案使用重新匹配()而不是搜索()因此可能产生意想不到的后果。在</p>
<p>下面是一个例子。这个例子只对文件中的每一行迭代一次,这意味着同样包含命中的上下文行将不再被查看。这可能是也可能不是理想的行为,但可以很容易地进行调整,以突出显示或以其他方式在上下文中标记前一次命中的行。在</p>
<pre><code>#!/usr/bin/env python
# -*- coding: utf-8 -*-
import codecs
import re
from collections import deque
def grep(pattern, input_file, context=0, case_sensitivity=True, file_encoding='utf-8'):
stack = deque()
hits = []
lines_remaining = None
with codecs.open(input_file, mode='rb', encoding=file_encoding) as f:
for line in f:
# append next line to stack
stack.append(line)
# keep adding context after hit found (without popping off previous lines of context)
if lines_remaining and lines_remaining > 0:
continue # go to next line in file
elif lines_remaining and lines_remaining == 0:
hits.append(stack)
lines_remaining = None
stack = deque()
# if stack exceeds needed context, pop leftmost line off stack
# (but include current line with possible search hit if applicable)
if len(stack) > context+1:
last_line_removed = stack.popleft()
# search line for pattern
if case_sensitivity:
search_object = re.search(pattern, line, re.UNICODE)
else:
search_object = re.search(pattern, line, re.IGNORECASE|re.UNICODE)
if search_object:
lines_remaining = context
# in case there is not enough lines left in the file to provide trailing context
if lines_remaining and len(stack) > 0:
hits.append(stack)
# return list of deques containing hits with context
return hits # you'll probably want to format the output, this is just an example
</code></pre>