<p>我假设输入文件的结构是每分钟一行,给出开始时间、段id和指示符的值。在</p>
<p>如果段的数量与可用内存兼容,我只需一次在线读取输入文件,并将分钟数添加到每个段的8个计数器,即每天的时间和指示符值。这意味着初始文件只读取一次,而不是排序,唯一关键的数量是段的数量-如果它变得太高,我将使用sqlite3或dbm数据库而不是dict</p>
<p>对于您当前的示例(使用is<strong>not</strong>csv),代码可以是:</p>
<pre><code>class Segment:
labels = ['AM', 'IP', 'PM', 'OP']
def __init__(self, segid):
self.id = segid
self.values = [ [ 0, 0 ] for i in range(4) ]
def add(self, hour, indic):
ix = 3
if hour >= 6 and hour < 10: ix=0
elif hour >= 10 and hour < 16: ix=1
elif hour >= 16 and hour < 19: ix = 2
self.values[ix][indic] += 1
def percent(self, ix):
try:
return int(.5 + (100 * self.values[ix][1] /
(self.values[ix][0] + self.values[ix][1])))
except ZeroDivisionError:
return 0
dummy = next(fd)
splitter = re.compile(' +')
segments = dict()
for line in fd: # read and store
d, seg, indic = splitter.split(line.strip()) # could be replaced with a csv reader
hour = int(d[11:13])
if not seg in segments:
segments[seg] = Segment(seg)
segments[seg].add(hour, int(indic))
for seg in sorted(segments.keys()): # output the stats
for ix in range(4):
print(seg, Segment.labels[ix], segments[seg].percent(ix))
</code></pre>
<p>上面的代码缺少对错误或异常情况的测试</p>