<p>这将调用具有子组匹配的正则表达式。
(<a href="https://docs.python.org/3.5/library/re.html#match-objects" rel="nofollow noreferrer">https://docs.python.org/3.5/library/re.html#match-objects</a>)</p>
<p>我的测试文件<code>data.txt</code>:</p>
<pre><code>QWEEEFVAJFLDVAJPQDVAJDSNJKVAJGHD
AFVAJFLDVAJPQDVAJDSNJKHFGHERQWFS
ONLY_TWO_VAJsOOVAJ123VAQQWERTY
START_VAJs_with_more_VAJ123VAJ_space_between
AAPVAJRCGVAJJKYVAJJJJJJJJVAJOOOO
AAPVAJRCGVAJJKYVAJJJJJJJJQQQOOOOO
</code></pre>
<p>Python代码:</p>
<pre><code>import re
pattern = "VAJ"
re_str = pattern + "..." + "(" + pattern + "..." +"(" + pattern + "(.*)))"
regex = re.compile(re_str)
regex_extra = re.compile(pattern + ".*")
for line in open("data.txt"):
line = line.strip()
match = regex.search(line)
if match:
result = list()
result.append(match.group(0)) # entire regex match
result.append(match.group(1)) # outer regex parenthesis'ed group
result.append(match.group(2)) # middle regex parenthesis'ed group
# Most inner regex parenthesis'ed group contains rest of the line.
# Use this to find extra pattern.
#
the_rest = match.group(3)
match_extra = regex_extra.search(the_rest)
if match_extra: # If one more <pattern> in the rest of the line
result.append(match_extra.group(0)) # add it to the result list
# Output
print(result)
</code></pre>
<p>输出:</p>
<pre><code>['VAJFLDVAJPQDVAJDSNJKVAJGHD', 'VAJPQDVAJDSNJKVAJGHD', 'VAJDSNJKVAJGHD', 'VAJGHD']
['VAJFLDVAJPQDVAJDSNJKHFGHERQWFS', 'VAJPQDVAJDSNJKHFGHERQWFS', 'VAJDSNJKHFGHERQWFS']
['VAJRCGVAJJKYVAJJJJJJJJVAJOOOO', 'VAJJKYVAJJJJJJJJVAJOOOO', 'VAJJJJJJJJVAJOOOO', 'VAJOOOO']
['VAJRCGVAJJKYVAJJJJJJJJQQQOOOOO', 'VAJJKYVAJJJJJJJJQQQOOOOO', 'VAJJJJJJJJQQQOOOOO']
</code></pre>
<p>文件的庞大性不是这段代码的问题,只要最长的一行在内存中放几次就可以了。你知道吗</p>