为包含单词的列表生成唯一ID问题的回答

为包含单词的列表生成唯一ID

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

你有两个错误。首先，你有一个简单的打字错误，这里： <pre><code>for word1,word2 in labels: ids.append([word_to_id [word1], word_to_id [word1]]) </code></pre> 您正在添加<code>word1</code>的id两次，在那里。改为更正第二个<code>word1</code>以查找<code>word2</code>。你知道吗 接下来，您将不测试您以前是否见过一个单词，因此对于<code>'Kleiber'</code>，您首先给它id <code>4</code>，然后在下一次迭代中用<code>6</code>覆盖该条目。您需要给出唯一的单词编号，而不是所有单词： <pre><code>counter = 0 for word in vocabulary: if word not in word_to_id: word_to_id[word] = counter counter += 1 </code></pre> 或者，如果已经列出了一个单词，就不能简单地将该单词添加到<code>vocabulary</code>。顺便说一句，你不需要一个单独的<code>vocabulary</code>列表。单独的循环不会为您带来任何好处，因此以下方法也很有效： <pre><code>word_to_id = {} counter = 0 for words in labels: for word in words: word_to_id [word] = counter counter += 1 </code></pre> 通过使用<a href="https://docs.python.org/3/library/collections.html#collections.defaultdict" rel="nofollow noreferrer">^{<cd9>} object</a>和<a href="https://docs.python.org/3/library/itertools.html#itertools.count" rel="nofollow noreferrer">^{<cd10>}</a>来提供默认值，可以大大简化代码： <pre><code>from collections import defaultdict from itertools import count def words_to_ids(labels): word_ids = defaultdict(count().__next__) return [[word_ids[w1], word_ids[w2]] for w1, w2 in labels] </code></pre> 每次调用<code>__next__</code>时，<code>count()</code>对象都会给您一个序列中的下一个整数值，每次尝试访问字典中尚不存在的键时，<code>defaultdict()</code>都会调用该整数值。它们一起确保了每个唯一单词的唯一ID。你知道吗

为包含单词的列表生成唯一ID

1 个回答

相关Python问题