<p>我把“非常用词”定义为不出现在前10000个最常见的英语单词中的单词。在</p>
<p>10K最常见的单词是任意边界,但正如<a href="https://github.com/first20hours/google-10000-english" rel="nofollow noreferrer">the github repo</a>所述:</p>
<blockquote>
<p>According to analysis of the Oxford English Corpus, the 7,000 most common English lemmas account for approximately 90% of usage, so a 10,000 word training corpus is more than sufficient for practical training applications.</p>
</blockquote>
<pre><code>import requests
english_most_common_10k = 'https://raw.githubusercontent.com/first20hours/google-10000-english/master/google-10000-english-usa-no-swears.txt'
# Get the file of 10 k most common words from TXT file in a github repo
response = requests.get(english_most_common_10k)
data = response.text
set_of_common_words = {x for x in data.split('\n')}
# Once we have the set of common words, we can just check.
# The check is in average case O(1) operation,
# but you can use for example some sort of search three with O(log(n)) complexity
while True:
word = input()
if word in set_of_common_words:
print(f'The word "{word}" is common')
else:
print(f'The word "{word}" is difficult')
</code></pre>