<p>这个解决方案与@Peter Gibson的几乎相同,在这个版本中,索引<em>是</em>数据,不涉及委派的<em>docSets</em>对象。这使得代码稍微简短和清晰。在</p>
<p>代码还保留了文档的原始顺序。。。这是一个bug,我更喜欢Peter的<code>set()</code>实现。在</p>
<p>还请注意,引用不存在的项,如<code>ix['garbage']</code>,会隐式地修改索引。如果唯一的API是<code>search</code>,这是可以的,但是这个例子值得注意。在</p>
<h2>来源</h2>
<pre><code>class InvertedIndex(dict):
def __init__(self, docs):
self.docs = docs
for doc_index,doc in enumerate(docs):
for term in doc.split(" "):
self[term].append(doc_index)
def __missing__(self, term):
# operate like defaultdict(list)
self[term] = []
return self[term]
def search(self, term):
return self.get(term) or 'No results'
docs=["new home sales top forecasts june june june",
"home sales rise in july june",
"increase in home sales in july",
"july new home sales rise",
'beer',
]
ix = InvertedIndex(docs)
print ix.__dict__
print
print 'sales:',ix.search("sales")
print 'whiskey:', ix.search('whiskey')
print 'beer:', ix.search('beer')
print '\nTEST OF KEY SETTING'
print ix['garbage']
print 'garbage' in ix
print ix.search('garbage')
</code></pre>
<h2>输出</h2>
^{pr2}$