擅长:python、mysql、java
<p>你的算法在词组数量上是二次方的,这可能正是它慢下来的原因。在这里,我用词来索引短语,使其在常见情况下低于二次方。在</p>
<pre><code># build index
foreach phrase: foreach word: phrases[word] += phrase
# use index to filter out phrases that contain all the words
# from another phrase
foreach phrase:
foreach word:
if first word:
siblings = phrases[word]
else
siblings = siblings intersection phrases[word]
# siblings now contains any phrase that has at least all our words
remove each sibling from the output set of phrases
# done!
</code></pre>