<h2>模拟多键词典</h2>
<p><a href="https://pypi.python.org/pypi/multi_key_dict" rel="nofollow noreferrer">^{<cd1>}</a>不允许<a href="https://docs.python.org/3/reference/datamodel.html#object.__getitem__" rel="nofollow noreferrer">^{<cd2>}</a>同时具有多个密钥。。。在</p>
<p>(例如<code>d["red", "green"]</code>)</p>
<p>多键可以用<a href="https://docs.python.org/3.5/library/stdtypes.html#tuples" rel="nofollow noreferrer">^{<cd4>}</a>或<a href="https://docs.python.org/3.5/library/stdtypes.html#set" rel="nofollow noreferrer">^{<cd5>}</a>键来模拟。如果顺序无关紧要,<a href="https://docs.python.org/3.5/library/stdtypes.html#set" rel="nofollow noreferrer">^{<cd5>}</a>似乎是最好的(实际上是散列的<a href="https://docs.python.org/3.5/library/stdtypes.html#frozenset" rel="nofollow noreferrer">^{<cd7>}</a>,因此[<code>"red", "blue"]</code>与<code>["blue", "red"]</code>相同。在</p>
<h2>模拟多元词典</h2>
<p>多值是使用某些数据类型固有的,它可以是<a href="https://docs.python.org/3.5/library/stdtypes.html#mapping-types-dict" rel="nofollow noreferrer">any <em>storage</em> element</a>,可以方便地编制索引。标准<a href="https://docs.python.org/3.5/library/stdtypes.html#dict" rel="nofollow noreferrer">^{<cd10>}</a>应该提供这一点。在</p>
<h2>非决定论</h2>
<p>使用由规则和假设定义的概率分布<sup>1</sup>,使用python文档中的<a href="https://docs.python.org/3/library/random.html#examples-and-recipes" rel="nofollow noreferrer">this recipe</a>执行非确定性选择。在</p>
<h2><code>MultiKeyMultiValNonDeterministicDict</code>类</h2>
<p>多好的名字啊。\o/<sup><sup>-不错!</sup></sup></sup></p>
<p>此类接受多个键,这些键定义了一个由多个值组成的概率规则集。在项目创建(<a href="https://docs.python.org/3/reference/datamodel.html#object.__setitem__" rel="nofollow noreferrer">^{<cd12>}</a>)期间,所有键组合的值概率都会预先计算<sup>1</sup>。在项目访问(<a href="https://docs.python.org/3/reference/datamodel.html#object.__getitem__" rel="nofollow noreferrer">^{<cd2>}</a>)期间,选择预先计算的概率分布,并基于随机加权选择对结果进行评估。在</p>
<h2>定义</h2>
<pre><code>import random
import operator
import bisect
import itertools
# or use itertools.accumulate in python 3
def accumulate(iterable, func=operator.add):
'Return running totals'
# accumulate([1,2,3,4,5]) --> 1 3 6 10 15
# accumulate([1,2,3,4,5], operator.mul) --> 1 2 6 24 120
it = iter(iterable)
try:
total = next(it)
except StopIteration:
return
yield total
for element in it:
total = func(total, element)
yield total
class MultiKeyMultiValNonDeterministicDict(dict):
def key_combinations(self, keys):
"""get all combinations of keys"""
return [frozenset(subset) for L in range(0, len(keys)+1) for subset in itertools.combinations(keys, L)]
def multi_val_rule_prob(self, rules, rule):
"""
assign probabilities for each value,
spreading undefined result probabilities
uniformly over the leftover results not defined by rule.
"""
all_results = set([result for result_probs in rules.values() for result in result_probs])
prob = rules[rule]
leftover_prob = 1.0 - sum([x for x in prob.values()])
leftover_results = len(all_results) - len(prob)
for result in all_results:
if result not in prob:
# spread undefined prob uniformly over leftover results
prob[result] = leftover_prob/leftover_results
return prob
def multi_key_rule_prob(self, key, val):
"""
assign probability distributions for every combination of keys,
using the default for combinations not defined in rule set
"""
combo_probs = {}
for combo in self.key_combinations(key):
if combo in val:
result_probs = self.multi_val_rule_prob(val, combo).items()
else:
result_probs = self.multi_val_rule_prob(val, frozenset([])).items()
combo_probs[combo] = result_probs
return combo_probs
def weighted_random_choice(self, weighted_choices):
"""make choice from weighted distribution"""
choices, weights = zip(*weighted_choices)
cumdist = list(accumulate(weights))
return choices[bisect.bisect(cumdist, random.random() * cumdist[-1])]
def __setitem__(self, key, val):
"""
set item in dictionary,
assigns values to keys with precomputed probability distributions
"""
precompute_val_probs = self.multi_key_rule_prob(key, val)
# use to show ALL precomputed probabilities for key's rule set
# print precompute_val_probs
dict.__setitem__(self, frozenset(key), precompute_val_probs)
def __getitem__(self, key):
"""
get item from dictionary,
randomly select value based on rule probability
"""
key = frozenset([key]) if isinstance(key, str) else frozenset(key)
val = None
weighted_val = None
if key in self.keys():
val = dict.__getitem__(self, key)
weighted_val = val[key]
else:
for k in self.keys():
if key.issubset(k):
val = dict.__getitem__(self, k)
weighted_val = val[key]
# used to show probabality for key
# print weighted_val
if weighted_val:
prob_results = self.weighted_random_choice(weighted_val)
else:
prob_results = None
return prob_results
</code></pre>
<h2>使用</h2>
^{pr2}$
<h2>测试</h2>
<p>检查概率</p>
<pre><code>N = 10000
red_green_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}
red_blue_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}
blue_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}
red_blue_green_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}
default_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}
for _ in xrange(N):
red_green_test[d["red","green"]] += 1.0
red_blue_test[d["red","blue"]] += 1.0
blue_test[d["blue"]] += 1.0
default_test[d["green"]] += 1.0
red_blue_green_test[d["red","blue","green"]] += 1.0
print 'red,green test =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in red_green_test.items())
print 'red,blue test =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in red_blue_test.items())
print 'blue test =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in blue_test.items())
print 'default test =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in default_test.items())
print 'red,blue,green test =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in red_blue_green_test.items())
</code></pre>
<hr/>
<pre><code>red,green test = car: 09.89% toy: 10.06% ballon: 80.05%
red,blue test = car: 05.30% toy: 47.71% ballon: 46.99%
blue test = car: 41.69% toy: 15.02% ballon: 43.29%
default test = car: 05.03% toy: 47.16% ballon: 47.81%
red,blue,green test = car: 04.85% toy: 49.20% ballon: 45.95%
</code></pre>
<h3>概率符合规则!</h3>
<hr/>
<h3>脚注</h3>
<ol>
<li><p>分布假设</p>
<p>由于规则集没有完全定义,所以对概率分布进行了假设,大部分假设都是在^{<cd14>中完成的。基本上,任何未定义的概率都将均匀地分布在剩余值上。这是针对<em>所有</em>键组合进行的,并为随机加权选择创建一个通用键接口。在</p>
<p>给出示例规则集</p>
<pre><code>d["red","blue","green"] = {
# {rule_set} : {result: probability}
frozenset(["red", "green"]): {"ballon": 0.8},
frozenset(["blue"]): {"toy": 0.15},
frozenset([]): {"car": 0.05}
}
</code></pre>
<p>这将创建以下发行版</p>
<pre><code>'red' = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
'green' = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
'blue' = [('car', 0.425), ('toy', 0.150), ('ballon', 0.425)]
'blue,red' = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
'green,red' = [('car', 0.098), ('toy', 0.098), ('ballon', 0.800)]
'blue,green' = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
'blue,green,red'= [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
default = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
</code></pre>
<p>如果这是不正确的,请告知。</p></li>
</ol>