<p>有了这样一个简单的模式,就根本不需要regex,尤其是不需要在相同的数据上重复使用不同的语言—您可以动态地流式解析并编写结果:</p>
<pre><code>with open("input.txt", "r") as f: # open the input file
file_handles = {} # a map of our individual output file handles
for line in f: # read it line by line
rindex = line.rfind("@") # find the last `@` character
language = line[rindex+1:rindex+3] # grab the following two characters as language
if rindex != -1: # char found, consider the line...
lindex = line.rfind("\"", 0, rindex-1) # find the preceding quotation
if lindex != -1: # found, we have a match
if language not in file_handles: # add a file handle for this language:
file_handles[language] = open("outputfile-{}.txt".format(language), "w")
# write the found slice between `lindex` and `rindex` + a new line
file_handles[language].write(line[lindex+1:rindex-1] + "\n")
for handle in file_handles.values(): # lets close our output file handles
handle.close()
</code></pre>
<p>它应该比regex+快得多,因为它可以与任何语言一起工作,所以如果你有<code>...@it</code>行,它也可以节省<code>outputfile-it.txt</code>。在</p>