<p>没有任何内置函数可以满足您的要求(据我所知),因此最好的方法似乎基本上就是您正在做的事情,按站点数量对河流进行分组,按站点数量进行排序,然后从排序列表中获取第一个<code>N</code></p>
<p>我还将把代码分成两个独立的函数:一个函数接收站点列表并按河流名称收集它们,另一个函数接收这些(河流名称、站点计数)对,并提取第一个<code>N</code></p>
<h3>通过河流收集站点的功能</h3>
<p>真正做到这一点的唯一方法是在所有的站点中循环并收集它们</p>
<pre class="lang-py prettyprint-override"><code>from collections import Counter
def collect_stations( stations ):
"""
:param stations: List of station objects.
:returns: Dictionary like object of name-station count pairs.
"""
river_count = {}
names = [ s.river for s in stations ]
return Counter( names )
</code></pre>
<h3>返回第一个<code>N</code>站的函数</h3>
<p>这是一个更紧凑的版本</p>
<pre class="lang-py prettyprint-override"><code>def highest_counts( river_stations, N, flatten = True ):
"""
:param river_stations: Dictionary like object of name-count pairs.
:param N: Number of count groups to return.
:param flatten: Flatten list of rivers.
:returns: If flatten is True returns a list of ( name, count ) tuples of N unique counts. i.e. Rivers with the same number of counts are treated as one element. If flatten is False, a dictionary of { count: [ ( name, count ) ] is returned, with N count keys.
"""
# group rivers by number of stations
grouped = {}
for name, count in river_stations.items():
if count not in grouped:
# add number group if it doesn't exist
grouped[ count ] = []
grouped[ count ].append( ( name, count ) )
# sort groups by number of stations
grouped = [ ( c, l ) for c, l in grouped.items() ]
grouped.sort( key = lambda x: x[ 0 ], reverse = True )
# get first N number groups
stats = grouped[ :N ]
if flatten:
stats = [
river
for num_list in stats
for river in num_list[ 1 ]
]
return stats
</code></pre>
<p>另一种方法是对初始列表进行排序,然后提取元素,直到看到<code>N</code>个站点编号</p>
<pre class="lang-py prettyprint-override"><code>from collections import Counter
def highest_counts( river_stations, N ):
"""
:param river_stations: Dictionary like object of name-count pairs.
:param N: Number of count groups to return.
:returns: List of ( name, count ) tuples of N unique counts. i.e. Rivers with the same number of counts are treated as one element.
"""
# sorts by number of stations
river_stations_list = [ ( name, count ) for name, count in river_stations.items() ]
s = sorted( river_stations_list, key = lambda x: x[ 1 ], reverse = True )
# gets number of stations for each element
nums = [ x[ 1 ] for x in s ]
# calculates how many indices incorporate first N number groups
freqs = list( Counter( nums ).values() )
ind = sum( freqs[ :N ] )
# return first elements that incorporate N number groups
return s[ :ind ]
</code></pre>
<p>进行快速性能检查后,第二个版本对于更大的输入速度更快。
<a href="https://i.stack.imgur.com/PfEAS.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/PfEAS.png" alt="Timing ratio of versions for differing input sizes."/></a></p>
<h3>最终功能</h3>
<p>最后一个函数将结合上述两个函数</p>
<pre class="lang-py prettyprint-override"><code>def rivers_by_station_number( stations, N ):
"""
:param stations: List of station objects.
:param N: Number of count groups to return.
:returns: List of ( name, count ) tuples of N unique counts. i.e. Rivers with the same number of counts are treated as one element.
"""
collected = collect_stations( stations ):
return highest_counts( collected, N )
</code></pre>