回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我正在尝试创建一个algo,它遍历一系列字符串,如果字符串满足某个条件,则将它们连接在一起,然后按连接的字符串数跳过,以避免重复计算相同连接字符串的部分。你知道吗</p>
<p>我知道I=I+x或I+=x不会改变每个循环的迭代次数,所以我正在寻找一种替代方法来跳过一个变量的多次迭代。你知道吗</p>
<p>背景:我试图创建一个命名实体识别算法,用于新闻文章。我将文本<code>('Prime Minister Jacinda Ardern is from New Zealand')</code>标记为<code>('Prime','Minister','Jacinda','Ardern','is'...)</code>,并在其上运行NLTK词性标记算法,给出:…<code>(('Jacinda','NNP'),('Ardern','NNP'),('is','VBZ')...</code>,然后在后续单词也是“NNP”/专有名词时组合单词。你知道吗</p>
<p>我们的目标是将“Prime Jacinda Ardern”计算为1个字符串,而不是4个字符串,然后将循环迭代次数跳过尽可能多的单词,以避免下一个字符串是“Prime Jacinda Ardern”,然后是“Jacinda Ardern”。你知道吗</p>
<p>上下文:
“text”是一个列表列表,通过标记我的文章,然后对其进行词性标记,其格式为:<code>[...('She', 'PRP'), ('said', 'VBD'), ('the', 'DT'), ('roughly', 'RB'), ('25-minute', 'JJ'), ('meeting', 'NN')...]</code>
‘NNP’=专有名词或地点/人员/组织等的名称</p>
<pre><code>for (i) in range(len(text)):
print(i)
#initialising wordcounter as a variable
wordcounter = 0
# if text[i] is a Proper Noun, make namedEnt = the word.
# then increase wordcounter by 1
if text[i][1] == 'NNP':
namedEnt = text[i][0]
wordcounter +=1
# while the next word in text is also a Proper Noun,
# increase wordcounter by 1. Initialise J as = 1
while text[i + wordcounter][1] == 'NNP':
wordcounter +=1
j = 1
# While J is less than wordcounter, join text[i+j] to
# namedEnt. Increase J by 1. When that is no longer
# the case append namedEnt to a namedEntity list
while j < wordcounter:
namedEnt = ' '.join([namedEnt,text[i+j][0]])
j += 1
InitialNamedEntity.append(namedEnt)
i += wordcounter
</code></pre>
<p>如果I<code>print(i)</code>在开始时,它一次增加1。当我打印由namedEnts组成的NamedEntity列表的计数器时,<code>i</code>结果如下:
<code>(...'New Zealand': 7, 'Zealand': 7, 'United': 4, 'Prime Minister Minister Jacinda Minister Jacinda Ardern': 3...)</code></p>
<p>因此,我不仅得到了“新西兰”和“新西兰”的双重统计,而且我还得到了像“总理雅辛达部长雅辛达阿登”这样古怪的结果。你知道吗</p>
<p>我想要的结果是<code>('New Zealand':7, 'United States':4,'Prime Minister Jacinda Ardern':3)</code></p>
<p>任何帮助都将不胜感激。干杯</p>