<p>给出:</p>
<pre><code>$ cat file.txt
...some lines before this...
MY TEST MATRIX (ROWS)
0.5056E+03 0.8687E-03 -0.1202E-02
0.5056E+03 0.8687E-03 -0.1202E-02
MY TEST END
0.5056E+03 0.8687E-03 -0.1202E-02
0.3776E+03 0.8687E-03 0.1975E-04
STOP
-some lines after this
MY TEST MATRIX (ROWS)
2E+04 2E+04 0.8687E-03
2E+04 2E+04 0.8687E-03
MY TEST END
0.5056E+03 0.8687E-03 -0.1202E-02
0.5056E+03 0.8687E-03 -0.1202E-02
STOP
-some lines after this
-this repeats in txt file
</code></pre>
<p>在<code>sed</code>、<code>perl</code>或<code>awk</code>中,可以使用range regex的概念来执行以下操作:</p>
<pre><code>$ sed -nE '/^MY TEST MATRIX/,/^MY TEST END/p' file.txt
MY TEST MATRIX (ROWS)
0.5056E+03 0.8687E-03 -0.1202E-02
0.5056E+03 0.8687E-03 -0.1202E-02
MY TEST END
MY TEST MATRIX (ROWS)
2E+04 2E+04 0.8687E-03
2E+04 2E+04 0.8687E-03
MY TEST END
</code></pre>
<p>您可以用一个FlipFlop类在Python中复制此功能:</p>
<pre><code>class FlipFlop:
''' Class to imitate the bahavior of /start/, /end/ flip flop in awk '''
def __init__(self, start_pattern, end_pattern):
self.patterns = start_pattern, end_pattern
self.state = False
def __call__(self, st):
ms=[e.search(st) for e in self.patterns]
if all(m for m in ms):
self.state = False
return True
rtr=True if self.state else False
if ms[self.state]:
self.state = not self.state
return self.state or rtr
</code></pre>
<p>然后在逐行读取文件时捕获块:</p>
<pre><code>di={}
blocks=[FlipFlop(re.compile(r'^MY TEST MATRIX \(ROWS\)'), re.compile(r'^MY TEST END')),
FlipFlop(re.compile(r'^MY TEST END'), re.compile(r'^STOP'))]
for i, ff in enumerate(blocks):
with open(fn) as f:
di[i]=[line.strip() for line in f if ff(line)]
</code></pre>
<p>结果:</p>
<pre><code>>>> di
{0: ['MY TEST MATRIX (ROWS)',
'0.5056E+03 0.8687E-03 -0.1202E-02',
'0.5056E+03 0.8687E-03 -0.1202E-02',
'MY TEST END',
'MY TEST MATRIX (ROWS)',
'2E+04 2E+04 0.8687E-03',
'2E+04 2E+04 0.8687E-03',
'MY TEST END'],
1: ['MY TEST END',
'0.5056E+03 0.8687E-03 -0.1202E-02',
'0.3776E+03 0.8687E-03 0.1975E-04',
'STOP',
'MY TEST END',
'0.5056E+03 0.8687E-03 -0.1202E-02',
'0.5056E+03 0.8687E-03 -0.1202E-02',
'STOP']}
</code></pre>
<p>这确实可以读取文件两次以节省内存;如果速度更重要,则可以将文件读入内存并对其进行迭代。你知道吗</p>