<p>保罗的答案也许是最好的,但以下是我在午休时想到的(诚然,这还没有经过测试,但仍然是一个有趣的思考练习)。不确定我的算法是否快速/优化。我个人建议看看类似MongoDB的NoSQL数据库,因为它似乎可以很好地解决此类问题(map/reduce等等)</p>
<pre><code># assuming events is a dictionary of id keyed to item bought...
user = {}
for (cust_id, item) in events:
if not cust_id in users:
user[cust_id] = set()
user[cust_id].add(item)
# now we have a dictionary of cust_ids keyed to a set of every item
# they've ever bought (given that repeats don't matter)
# now we construct a dict of items keyed to a dictionary of other items
# which are in turn keyed to num times present
items = {}
def insertOrIter(d, k, v):
if k in d:
d[k] += v
else:
d[k] = v
for key in user:
# keep track of items bought with each other
itemsbyuser = []
for item in user[key]:
# make sure the item with dict is set up
if not item in items:
items[item] = {}
# as we see each item, add to it others and others to it
for other in itemsbyuser:
insertOrIter(items[other], item, 1)
insertOrIter(items[item], other, 1)
itemsbyuser.append(item)
# now, unless i've screwed up my logic, we have a dictionary of items keyed
# to a dictionary of other items keyed to how many times they've been
# bought with the first item. *whew*
# If you want something more (potentially) useful, we just turn that around to be a
# dictionary of items keyed to a list of tuples of (times seen, other item) and
# you're good to go.
useful = {}
for i in items:
temp = []
for other in items[i]:
temp[].append((items[i][other], other))
useful[i] = sorted(temp, reverse=True)
# Now you should have a dictionary of items keyed to tuples of
# (number times bought with item, other item) sorted in descending order of
# number of times bought together
</code></pre>