<p>你的车不太远。使之更容易的一种方法是获取<code>'IN'</code>和<code>'TO'</code>的所有索引:</p>
<pre><code>starts = {'IN', 'TO'}
in_twos = [i for i, e in enumerate(new) if e in starts]
</code></pre>
<p>它给出:</p>
<pre><code>[2, 8, 10, 15]
</code></pre>
<p>然后您只需要遍历这些索引,特别是<code>new[i+1:]</code>,并获取<code>'NN'</code>或<code>'NNP'</code>元素。当您到达一个不是这些元素之一的元素时,<code>break</code>将退出循环。你知道吗</p>
<p>举个例子:</p>
<pre><code>result = []
take = {'NN', 'NNP'}
for i in in_twos:
temp = []
for x in new[i+1:]:
if x not in take:
break
temp.append(x)
# If this is empty, don't add it
if temp:
result.append(temp)
print(result)
</code></pre>
<p>最终输出:</p>
<pre><code>[['NNP', 'NN'], ['NN'], ['NNP', 'NN', 'NN']]
</code></pre>
<p>另一个较短的方法,如@schwobasegll所建议的,是使用<a href="https://docs.python.org/3/library/itertools.html#itertools.takewhile" rel="nofollow noreferrer">^{<cd7>}</a>来简化<code>'NN'</code>元素的提取。这个函数基本上一直提取元素,直到第一个参数谓词返回false。你知道吗</p>
<p>下面是它的样子:</p>
<pre><code>from itertools import takewhile
# new, take and in_twos same as before
result = [l for l in [list(takewhile(lambda x: x in take, new[i+1:])) for i in in_twos] if l]
print(result)
# [['NNP', 'NN'], ['NN'], ['NNP', 'NN', 'NN']]
</code></pre>
<p><strong>更新:</strong></p>
<p>如果要将单词和演讲映射到一起,可以执行以下操作:</p>
<pre><code>new = [['JJ', 'NN', 'IN','NNP','NN','MD','VB','VBN','IN','NN','TO','VB','NN','CC','NN','TO','NNP','NN','NN','.'],
['Additional','condition','of','DeNOx','activation','shall','be','introduced','in', 'order','to','provide','flexibility','and','robustness', 'to','NSC','regeneration','management','.']]
starts = {'IN', 'TO'}
in_twos = [i for i, e in enumerate(new[0]) if e in starts]
speech = []
words = []
take = {'NN', 'NNP'}
for i in in_twos:
temp = []
for x, y in zip(new[0][i+1:], new[1][i+1:]):
if x not in take:
break
temp.append((x, y))
# If this is empty, don't add it
if temp:
speech.append([x for x, _ in temp])
words.append([y for _, y in temp])
print(speech)
print(words)
</code></pre>
<p>输出:</p>
<pre><code>[['NNP', 'NN'], ['NN'], ['NNP', 'NN', 'NN']]
[['DeNOx', 'activation'], ['order'], ['NSC', 'regeneration', 'management']]
</code></pre>