<p>赞美正则表达式的力量吧:</p>
<pre><code>states_rx = re.compile(r'''
^
(?P<state>.+?)\[edit\]
(?P<cities>[\s\S]+?)
(?=^.*\[edit\]$|\Z)
''', re.MULTILINE | re.VERBOSE)
cities_rx = re.compile(r'''^[^()\n]+''', re.MULTILINE)
transformed = '\n'.join(lst_)
result = {state.group('state'): [city.group(0).rstrip()
for city in cities_rx.finditer(state.group('cities'))]
for state in states_rx.finditer(transformed)}
print(result)
</code></pre>
<p>这就产生了</p>
^{pr2}$
<p/><hr/>
<h3>说明:</h3>
<p>这样做的目的是将任务分成几个较小的任务:</p>
<ol>
<li>{Join<cd1>完成列表}</li>
<li>独立州</li>
<li>独立城镇</li>
<li>对所有找到的项目进行听写理解</li>
</ol>
<p/><hr/>
<strong>第一个子任务</strong>
^{3}$
<p><strong>第二个子任务</strong></p>
<pre><code>^ # match start of the line
(?P<state>.+?)\[edit\] # capture anything in that line up to [edit]
(?P<cities>[\s\S]+?) # afterwards match anything up to
(?=^.*\[edit\]$|\Z) # ... either another state or the very end of the string
</code></pre>
<p>见<a href="https://regex101.com/r/ht9rTp/4" rel="nofollow noreferrer"><strong>the demo on regex101.com</strong></a>。在</p>
<p><strong>第三个子任务</strong></p>
<pre><code>^[^()\n]+ # match start of the line, anything not a newline character or ( or )
</code></pre>
<p>见<a href="https://regex101.com/r/ht9rTp/2" rel="nofollow noreferrer"><strong>another demo on regex101.com</strong></a>。在</p>
<p><strong>第四个子任务</strong></p>
<pre><code>result = {state.group('state'): [city.group(0).rstrip() for city in cities_rx.finditer(state.group('cities'))] for state in states_rx.finditer(transformed)}
</code></pre>
<p>这大致相当于:</p>
<pre><code>for state in states_rx.finditer(transformed):
# state is in state.group('state')
for city in cities_rx.finditer(state.group('cities')):
# city is in city.group(0), possibly with whitespaces
# hence the rstrip
</code></pre>
<p/><hr/>
最后,一些时间问题:
<pre><code>import timeit
print(timeit.timeit(findstatesandcities, number=10**5))
# 12.234304904000965
</code></pre>
<p>因此,在我的电脑上运行上述a<strong>100000</strong>次需要大约12秒,所以它应该相当快。在</p>