<p>一种方法是,假设您的列表是按序列号排序的(看起来是这样的),则通过生成器运行该列表以将每个航班聚合在一起:</p>
<pre><code>def aggregate_flights(flights):
out = []
last_id = ''
for row in flights:
if row[-2] != last_id and len(out) > 0:
yield (last_id,out)
out = []
last_id = row[-2]
out.append((row[3],row[4])) #2-tuple of (start,end)
yield (last_id,out)
</code></pre>
<p>作为示例输入:</p>
<pre><code>list(aggregate_flight(agg))
Out[21]:
[('156912756', [('083914', '084141')]),
('156912546', [('005500', '010051'), ('010051', '010310')])]
</code></pre>
<p>有点乱,但你明白了。对于每个航班,您将有一个<code>(start,end)</code>的2元组列表,您可以进一步处理该列表以获得该航班的总体<code>(start,end)</code>。您甚至可以修改生成器,使其只提供总体的<code>(start,end)</code>,但我倾向于在较小的模块块中进行处理,这些模块块易于调试。你知道吗</p>
<p>如果输入未排序,则需要使用<code>defaultdict</code>累积数据。给它一个<code>list</code>工厂,并为每一行附加一个<code>(start,end)</code>元组。你知道吗</p>
<p><strong>编辑:</strong>根据要求,这里的修改只产生单个<code>(start,end)</code>对:</p>
<pre><code>def aggregate_flights(flights):
last_id,start,end = None,None,None
for row in flights:
if row[-2] != last_id and last_id is not None:
yield (last_id,(start,end))
start,end = None,None
if start is None:
start = row[3]
last_id = row[-2]
end = row[4]
yield (last_id,(start,end))
</code></pre>
<p>在这一点上,我会注意到输出变得太难看了(一个<code>(id,(start,end))</code>元组,呃),所以我会向上移动到<code>namedtuple</code>以使事情变得更好:</p>
<pre><code>from collections import namedtuple
Flight = namedtuple('Flight',['id','start','end'])
</code></pre>
<p>现在你有了:</p>
<pre><code>def aggregate_flights(flights):
last_id,start,end = None,None,None
for row in flights:
if row[-2] != last_id and last_id is not None:
yield Flight(last_id,start,end)
start,end = None,None
if start is None:
start = row[3]
last_id = row[-2]
end = row[4]
yield Flight(last_id,start,end)
list(aggregate_flights(agg))
Out[18]:
[Flight(id='156912756', start='083914', end='084141'),
Flight(id='156912546', start='005500', end='010310')]
</code></pre>
<p>好多了。你知道吗</p>