<p>我的两分钱:<br/>
-Python 2.7.5<br/>
-我使用defaultdict保存每个<em>名称</em>的前一行<br/>
-我使用有界长度deques来保存之前的行,因为我喜欢完整deque的fifo行为。这让我很容易思考它-只要不断地往里面塞东西。<br/>
-我用过运算符.itemgetter()用于索引和切片,因为它读起来更好。在</p>
<pre><code>from collections import deque, defaultdict
import csv
from functools import partial
from operator import itemgetter
# use a 3 item deque to hold the
# previous three rows for each name
deck3 = partial(deque, maxlen = 3)
data = defaultdict(deck3)
name = itemgetter(2)
date = itemgetter(1)
sixplus = itemgetter(slice(6,None))
fields = ['Datatitle', 'Date', 'Name', 'Score', 'Parameter',
'LTscore', 'LTParameter', 'LTscore+1', 'LTParameter+1',
'LTscore+2', 'LTParameter+3']
with open('data.txt') as infile, open('processed.txt', 'wb') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
writer.writerow(fields)
# comment out the next line if the data file does not have a header row
reader.next()
for row in reader:
default = deque(['x', 'y', 'x', 'y', 'x', 'y'], maxlen = 6)
try:
previous_row = data[name(row)][-1]
previous_date = date(previous_row)
except IndexError:
previous_date = None
if previous_date == date(row):
# use the xtra stuff from last time
row.extend(sixplus(previous_row))
# discard the previous row because
# there is a new row with the same date
data[name(row)].pop()
else:
# add columns 3 and 4 from each previous row
for deck in data[name(row)]:
# adding new items to a full deque causes
# items to drop off the other end
default.appendleft(deck[4])
default.appendleft(deck[3])
row.extend(default)
writer.writerow(row)
data[name(row)].append(row)
</code></pre>
<p>在一杯波尔图葡萄酒中思考了一下这个解决方案之后,我意识到它太复杂了——当我试图变得花哨时,这种情况往往会发生。对协议不太确定,所以我就不谈了——它确实有一个可能的优势,即为每个名称保留前3行。在</p>
<p>下面是一个使用切片和常规字典的解决方案。它只保留先前处理过的行。简单得多。我保留了itemgetters,同样是为了可读性。在</p>
^{pr2}$
<p>我发现,对于类似类型的处理,积累行并将它们分块写入,而不是单独地写入,可以大大提高性能。另外,如果可能,一次读取整个数据文件也会有所帮助。在</p>