擅长:python、mysql、java
<p>这里有一个快速完成的方法(您仍然需要对此进行清理,但我认为您的答案在这里):</p>
<pre><code>import re
with open('Departed.txt', 'r') as f:
data = f.read()
# match all words or sequences of words that are all caps
scene_or_character_re = re.compile(r'\b([A-Z][A-Z\W]+)\W')
groupings = scene_or_character_re.split(data)
# groupings is a list of strings, alternating caps, normal, caps, normal
def cleanup_spaces(s):
'''helper function to replace whitespace with single spaces'''
return re.sub(r'\s\s+', ' ', s).strip()
# split list into tuples of length two (caps, normal)
pointer = iter(groupings)
groups = []
for p in pointer:
if cleanup_spaces(p) == '':
continue # skip blank lines
actor = cleanup_spaces(p)
line = cleanup_spaces(next(pointer)) # this also increments the iterator used by the `for` loop
groups.append((actor, line))
</code></pre>
<p>这将为您提供:</p>
^{pr2}$