<p>我认为取每个组的max元素(<code>O(|elements|)</code>),然后取n个最大的列(<code>O(|groups|.lg n)</code>,堆的大小是<code>n</code>),而不是先排序(<code>O(|elements|.lg |elements|)</code>),然后取<code>n</code>元素(<code>O(|elements|)</code>):</p>
<p>创建一个dict<code>max_by_tag</code>,用于存储带有max rank by标记的项:</p>
<pre><code>>>> from collections import namedtuple
>>> Data = namedtuple('Data', ('tag', 'rank'))
>>> n = 3
>>> algorithm_input = { Data('a', 200), Data('a', 100), Data('b', 50), Data('c', 10), Data('d', 5) }
>>> max_by_tag = {}
>>> for item in algorithm_input:
... if item.tag not in max_by_tag or item.rank > max_by_tag[item.tag].rank:
... max_by_tag[item.tag] = item
>>> max_by_tag
{'a': Data(tag='a', rank=200), 'b': Data(tag='b', rank=50), 'c': Data(tag='c', rank=10), 'd': Data(tag='d', rank=5)}
</code></pre>
<p>然后使用<a href="https://docs.python.org/3/library/heapq.html#heapq.nlargest" rel="nofollow noreferrer">^{<cd8>}</a>模块:</p>
<pre><code>>>> import heapq
>>> heapq.nlargest(n, max_by_tag.values(), key=lambda data: data.rank)
[Data(tag='a', rank=200), Data(tag='b', rank=50), Data(tag='c', rank=10)]
</code></pre>