<p>你的文件似乎是按日期顺序排列的。如果我们为每个日期的每个名称取最后一个条目,并将其添加到每个名称的大小deque中,同时写出每一行,那么就可以做到:</p>
<pre><code>import csv
from collections import deque, defaultdict
from itertools import chain, islice, groupby
from operator import itemgetter
# defaultdict whose first access of a key will create a deque of size 3
# defaulting to [['x', 'y'], ['x', 'y'], ['x' ,'y']]
# Since deques are efficient at head/tail manipulation, then an insert to
# the start is efficient, and when the size is fixed it will cause extra
# elements to "fall off" the end...
names_previous = defaultdict(lambda: deque([['x', 'y']] * 3, 3))
with open('sample.csv', 'rb') as fin, open('sample_new.csv', 'wb') as fout:
csvin = csv.reader(fin)
csvout = csv.writer(fout)
# Use groupby to detect changes in the date column. Since the data is always
# asending, the items within the same data are contigious in the data. We use
# this to identify the rows within the *same* date.
# date=date we're looking at, rows=an iterable of rows that are in that date...
for date, rows in groupby(islice(csvin, 1, None), itemgetter(1)):
# After we've processed entries in this date, we need to know what items of data should
# be considered for the names we've seen inside this date. Currently the data
# is taken from the last occurring row for the name.
to_add = {}
for row in rows:
# Output the row present in the file with a *flattened* version of the extra data
# (previous items) that we wish to apply. eg:
# [['x, 'y'], ['x', 'y'], ['x', 'y']] becomes ['x', 'y', 'x', 'y', 'x', y']
# So we're easily able to store 3 pairs of data, but flatten it into one long
# list of 6 items...
# If the name (row[2]) doesn't exist yet, then by trying to do this, defaultdict
# will automatically create the default key as above.
csvout.writerow(row + list(chain.from_iterable(names_previous[row[2]])))
# Here, we store for the name any additional data that should be included for the name
# on the next date group. In this instance we store the information seen for the last
# occurrence of that name in this date. eg: If we've seen it more than once, then
# we only include data from the last occurrence.
# NB: If you wanted to include more than one item of data for the name, then you could
# utilise a deque again by building it within this date group
to_add[row[2]] = row[3:5]
for key, val in to_add.iteritems():
# We've finished the date, so before processing the next one, update the previous data
# for the names. In this case, we push a single item of data to the front of the deck.
# If, we were storing multiple items in the data loop, then we could .extendleft() instead
# to insert > 1 set of data from above.
names_previous[key].appendleft(val)
</code></pre>
<p>这将在运行期间只在内存中保留名称和最后3个值。在</p>
<p>可能希望调整以包含正确的/写入新的标头,而不是在输入时跳过这些标头。在</p>