擅长:python、mysql、java
<p>将doc指示符存储在Python<a href="https://docs.python.org/2/library/sets.html" rel="noreferrer">set</a>中,并使用dict引用每个术语的“doc set”。在</p>
<pre><code>from collections import defaultdict
class invertedIndex(object):
def __init__(self,docs):
self.docSets = defaultdict(set)
for index, doc in enumerate(docs):
for term in doc.split():
self.docSets[term].add(index)
def search(self,term):
return self.docSets[term]
docs=["new home sales top forecasts june june june",
"home sales rise in july june",
"increase in home sales in july",
"july new home sales rise"]
i=invertedIndex(docs)
print i.search("sales") # outputs: set([0, 1, 2, 3])
</code></pre>
<p><code>set</code>的工作方式有点像列表,但无序,不能包含重复的条目。在</p>
<p><code>defaultdict</code>基本上是一个<code>dict</code>,当没有数据可用时,它有一个默认类型(在本例中是一个空集)。在</p>