擅长:python、mysql、java
<p>这应该可以做到:</p>
<pre><code>import re
from pprint import pprint
reg = re.compile(r"(\w+):\n((?:\s+\w+(?:\n|$))*)")
with open('file.txt', 'r') as f:
data = {
name: lines.split()
for name, lines in reg.findall(f.read())
}
pprint(data)
</code></pre>
<p>产出:</p>
<pre><code>{'lineA': ['line1', 'line2', 'line3'],
'lineB': ['line4', 'line5', 'line6'],
'lineC': ['line7', 'line8', 'line9']}
</code></pre>
<hr/>
<h3>{<cd1>}</h3>
<p>捕获两个主要组:<code>(\w+)</code>和<code>((?:\s+\w+(?:\n|$))*)</code></p>
<p>所有其他组都设置为非捕获以使<code>findall</code>易于使用</p>
<pre><code>(\w+) Capture a word in group 1
:\n Group 1 followed by :\n
( Start the capture for group 2
(?: Start a non capturing group for the repeated content
\s+\w+ Starts with whitespace followed by a word
(?:\n|$) Followed by a new line or the file end
)* Non capturing group repeats multiple times
) End group 2
</code></pre>