<p>可以使用<a href="https://stackoverflow.com/a/17508761/190597">utility function ^{<cd1>}</a>分块处理文件:</p>
<pre><code>import re
import subprocess
def open_chunk(readfunc, delimiter, chunksize=1024):
"""
readfunc(chunksize) should return a string.
"""
remainder = ''
for chunk in iter(lambda: readfunc(chunksize), ''):
pieces = re.split(delimiter, remainder + chunk)
for piece in pieces[:-1]:
yield piece
remainder = pieces[-1]
if remainder:
yield remainder
f = open(filename, 'r')
for chunk in open_chunk(f.read, delimiter=r'-{45,}'):
chunk = chunk.strip()
if chunk:
lines = chunk.splitlines()
firstline = lines[0]
car_number = firstline.split()[1][:-1]
for line in lines[1:]:
if 'Owner_Info.User_ref = ' in line:
owner_user = line.split(" = ")[1]
elif 'CarModel = ' in line:
car_model = line.split(" = ")[1]
cmd = ['insert_owner_car.pl'
, '-id'
, car_number
, '-o'
, 'owner_user="%s"' % (owner_user, )
, 'car_model="%s"' % (car_model, )
, 'priority="Unknown"']
print(' '.join(cmd))
# subprocess.call(cmd)
f.close()
</code></pre>
<p>印刷品</p>
^{pr2}$
<hr/>
<p>如果数据文件很小,则可以将整个文件分成一个字符串,然后使用<code>re.split</code>将其拆分为多个块:</p>
<pre><code>In [37]: import re
In [38]: re.split(r'-{45,}', open('data').read())
Out[38]:
['\n\n',
'\nTM 05120970.01: Processing...\nTM 05120970: Processing...\nTM 05120970: current status Open\nTM 05120970: Owner_Info.User_ref = crossi14\nTM 05120970: Owner_Info.Email = Criss.Rossi@gmail.com\nTM 05120970: CarModel = Nissan Micra\n',
'\nTM 05157414.06: Processing...\nTM 05157414: Processing...\nTM 05157414: current status Open\nTM 05157414: Owner_Info.User_ref = yumiao12\nTM 05157414: Owner_Info.Email = Yu.Miao@gmail.com\nTM 05157414: CarModel = Toyota Avensis\n',
'\n']
</code></pre>
<p>这可以代替上面的<code>open_chunk</code>。使用<code>open_chunk</code>的优点是可以在非常大的文件上使用,因为将整个文件分成一个字符串并将其拆分为一个列表需要太多内存。在</p>