<p><strong>解决方案:</strong></p>
<pre><code>#!/usr/bin/env python
def readdata(filename):
last = []
count = 0
with open(filename, "r") as fd:
for line in fd:
tokens = line.strip().split()
tokens[2] = int(tokens[2])
if not last:
last = tokens
if tokens[:2] != last[:2]:
yield last[:2], count or 1, last[2]
last = tokens
count = 1
else:
count += 1
tokens[2] = min(tokens[2], last[2])
yield last[:2], count, last[2]
with open("output.txt", "w") as fd:
for words, count, year in readdata("data.txt"):
fd.write(
"{0:s} {1:s} ({2:d} {3:d})\n".format(
words[0], words[1], count, year
)
)
</code></pre>
<p><strong>输出:</strong></p>
^{pr2}$
<p><strong>讨论:</strong></p>
<ul>
<li>它以迭代的方式读取和处理数据(<em>python2.x</em>),因此它不会将所有内容读入内存,从而允许处理非常大的数据文件。在</li>
<li>只要对输入数据进行排序,也不需要复杂的数据结构。我们只需要跟踪最后一组代币,并跟踪每套“重复”的最小年份。在</li>
</ul>
<p>实际的算法与<a href="https://docs.python.org/2/library/itertools.html#itertools.groupby" rel="nofollow">itertools.groupby</a>非常相似(<em>请参阅使用此方法的另一个答案,但假设Python3.x</em>)。在</p>
<p>可能值得注意的是,这个实现也是``O(n`)(<em><a href="http://en.wikipedia.org/wiki/Big_O_notation" rel="nofollow">Big O</a></em>)。在</p>