擅长:python、mysql、java
<p>python stdlib包含<code>difflib</code>模块,用于进行差异处理,类似于这样(尽管结果与您预期的略有不同):</p>
<pre><code>pos1 = ['ROOT', 'SBARQ', 'WHADVP', 'WRB', 'SQ', 'VP', 'VBP', 'ADJP', 'RB', 'JJ', 'NP', 'NNP', 'NP', 'NP', 'NNS', 'VP', 'VBG', 'NP', 'NP', 'NNS', 'SBAR', 'WHNP', 'WDT', 'S', 'VP', 'VBP', 'ADVP', 'RB', 'VP', 'VBN', 'PP', 'IN', 'NP', 'NNP', '.']
pos2 = ['ROOT', 'SBARQ', 'WHADVP', 'WRB', 'SQ', 'VBP', 'NP', 'NNS', 'VP', 'VB', 'NP', 'NP', 'NNP', 'NNS', 'SBAR', 'WHNP', 'WDT', 'S', 'VP', 'MD', 'VP', 'VB', 'VP', 'VBN', 'ADVP', 'RB', 'PP', 'IN', 'NP', 'NNP', '.']
from difflib import SequenceMatcher
sm = SequenceMatcher(a=pos1, b=pos2)
for diff in sm.get_opcodes():
# uncomment this to see all the diffs
# print(diff)
op, f1_from, f1_to, f2_from, f2_to = diff
if op == 'equal':
print("{}{}".format(f1_to-f1_from, tuple(pos1[f1_from:f1_to])))
</code></pre>
<p>提供:</p>
<pre><code>5('ROOT', 'SBARQ', 'WHADVP', 'WRB', 'SQ')
1('VBP',)
3('NP', 'NNS', 'VP')
2('NP', 'NP')
6('NNS', 'SBAR', 'WHNP', 'WDT', 'S', 'VP')
2('ADVP', 'RB')
5('PP', 'IN', 'NP', 'NNP', '.')
</code></pre>