<h2>概述</h2>
<p>您可以使用此代码提取姓名,并传递[david,bob,etc.]的列表:</p>
<p><a href="https://stackoverflow.com/questions/15364975/is-there-an-easy-way-generate-a-probable-list-of-words-from-an-unspaced-sentence/15367466#15367466">Is there an easy way generate a probable list of words from an unspaced sentence in python?</a></p>
<p>然后使用<code>collections.Counter</code>得到频率。在</p>
<h2>代码</h2>
<pre><code>from Bio import trie
import string
from collections import Counter
def get_trie(words):
tr = trie.trie()
for word in words:
tr[word] = len(word)
return tr
def get_trie_word(tr, s):
for end in reversed(range(len(s))):
word = s[:end + 1]
if tr.has_key(word):
return word, s[end + 1: ]
return None, s
def get_trie_words(s):
names = ['david', 'bob', 'karl', 'joe', 'mike']
tr = get_trie(names)
while s:
word, s = get_trie_word(tr, s)
yield word
def main(urls):
d = Counter()
for url in urls:
url = "".join(a for a in url if a in string.lowercase)
for word in get_trie_words(url):
d[word] += 1
return d
if __name__ == '__main__':
urls = [
"davidbobmike1joe",
"mikejoe2bobkarl",
"joemikebob",
"bobjoe",
]
print main(urls)
</code></pre>
<h2>结果</h2>
^{pr2}$