<h2>局限性</h2>
<p>如果你拒绝使用字典,你的算法将需要大量的计算。除此之外,不可能区分只出现一次的关键字(例如:“karl”)和糟糕的序列(例如:“e2bo”)。我的解决方案将是一个最大的努力,只有当你的网址列表包含关键字多次。在</p>
<h2>基本思想</h2>
<p>我假设一个单词是一系列频繁出现的字符,至少有3个字符。这就阻止了字母“o”成为最流行的单词。在</p>
<p>基本思路如下。在</p>
<ul>
<li>对所有n个字母序列进行计数,并选择多次出现的一次。在</li>
<li>剪切所有属于一个较大序列的序列。在</li>
<li>按受欢迎程度排序,你就有了一个接近解决问题的解决方案。(留给读者练习)</li>
</ul>
<h2>在代码中</h2>
<pre><code>import operator
sentences = ["davidbobmike1joe" , "mikejoe2bobkarl", "joemikebob", "bobjoe", "bobbyisawesome", "david", "bobbyjoe"];
dict = {}
def countWords(n):
"""Count all possible character sequences/words of length n occuring in all given sentences"""
for sentence in sentences:
countWordsSentence(sentence, n);
def countWordsSentence(sentence, n):
"""Count all possible character sequence/words of length n occuring in a sentence"""
for i in range(0,len(sentence)-n+1):
word = sentence[i:i+n]
if word not in dict:
dict[word] = 1;
else:
dict[word] = dict[word] +1;
def cropDictionary():
"""Removes all words that occur only once."""
for key in dict.keys():
if(dict[key]==1):
dict.pop(key);
def removePartials(word):
"""Removes all the partial occurences of a given word from the dictionary."""
for i in range(3,len(word)):
for j in range(0,len(word)-i+1):
for key in dict.keys():
if key==word[j:j+i] and dict[key]==dict[word]:
dict.pop(key);
def removeAllPartials():
"""Removes all partial words in the dictionary"""
for word in dict.keys():
removePartials(word);
for i in range(3,max(map(lambda x: len(x), sentences))):
countWords(i);
cropDictionary();
removeAllPartials();
print dict;
</code></pre>
<h2>输出</h2>
^{pr2}$
<h2>对读者的一些挑战</h2>
<ul>
<li>打印前按值对词典进行排序。(<a href="https://stackoverflow.com/questions/613183/python-sort-a-dictionary-by-value">Sort a Python dictionary by value</a>)</li>
<li>在这个例子中,“bob”出现了6次,2次是“bobby”的部分单词。确定这是否有问题,并在必要时进行修复。在</li>
<li>把资本化考虑在内。在</li>
</ul>