<p>我已经实现了一个用于计算单图、双图和三元图的示例。您可以使用<code>zip</code>轻松地加入项目。此外,<code>Counter</code>用于计算项目,而<code>defaultdict</code>用于项目的概率<code>defaultdict</code>在密钥未映射到集合中时非常重要,返回零。否则,您必须添加if子句以避免<code>None</code></p>
<pre class="lang-py prettyprint-override"><code>from collections import Counter, defaultdict
def calculate_grams(items_list):
# count items in list
counts = Counter()
for item in items_list:
counts[item] += 1
# calculate probabilities, defaultdict returns 0 if not found
prob = defaultdict(float)
for item, count in counts.most_common():
prob[item] = count / len(items_list)
return prob
def calculate_bigrams(words):
# tuple first and second items
return calculate_grams(list(zip(words, words[1:])))
def calculate_trigrams(words):
# tuple first, second and third items
return calculate_grams(list(zip(words, words[1:], words[2:])))
dataset = ['a', 'b', 'b', 'c', 'a', 'a', 'a', 'b', 'e', 'e', 'c']
# create dictionary
dictionary = set(dataset)
print("Dictionary", dictionary)
unigrams = calculate_grams(dataset)
print("Unigrams", unigrams)
bigrams = calculate_bigrams(dataset)
print("Bigrams", bigrams)
trigrams = calculate_trigrams(dataset)
print("Trigrams", trigrams)
# Testing
test_words = ['a', 'b']
print("Testing", test_words)
for c in dictionary:
# calculate each probabilities
unigram_prob = unigrams[c]
bigram_prob = bigrams[(test_words[-1], c)]
trigram_prob = bigrams[(test_words[-2], test_words[-1], c)]
# calculate total probability
prob = .2 * unigram_prob + .2 * bigram_prob + .4 * trigram_prob
print(c, prob)
</code></pre>
<p>输出:</p>
<pre><code>Unigrams defaultdict(<class 'float'>, {'a': 0.36363636363636365, 'b': 0.2727272727272727, 'c': 0.18181818181818182, 'e': 0.18181818181818182})
Bigrams defaultdict(<class 'float'>, {('a', 'b'): 0.2, ('a', 'a'): 0.2, ('b', 'b'): 0.1, ('b', 'c'): 0.1, ('c', 'a'): 0.1, ('b', 'e'): 0.1, ('e', 'e'): 0.1, ('e', 'c'): 0.1})
Trigrams defaultdict(<class 'float'>, {('a', 'b', 'b'): 0.1111111111111111, ('b', 'b', 'c'): 0.1111111111111111, ('b', 'c', 'a'): 0.1111111111111111, ('c', 'a', 'a'): 0.1111111111111111, ('a', 'a', 'a'): 0.1111111111111111, ('a', 'a', 'b'): 0.1111111111111111, ('a', 'b', 'e'): 0.1111111111111111, ('b', 'e', 'e'): 0.1111111111111111, ('e', 'e', 'c'): 0.1111111111111111})
Testing ['a', 'b']
e 0.05636363636363637
b 0.07454545454545455
c 0.05636363636363637
a 0.07272727272727274
</code></pre>