<p><strong>更新:</strong>由于您不需要换行来拆分文件,因此您最好只对文件进行slurp操作,适当地拆分,然后编写一个新文件。简单的解决方案是:</p>
<pre><code>import os, tempfile
with open('file.txt') as f,\
tempfile.NamedTemporaryFile('w', dir='.', delete=False) as tf:
# You've got a space only before second copy, so it's a useful partition point
firstcopy, _, _ f.read().partition(' Story1: ')
# Write first copy
tf.write(firstcopy)
# Exiting with block closes temporary file so data is there
# Atomically replace original file with rewritten temporary file
os.replace(tf.name, 'file.txt')
</code></pre>
<p>从技术上讲,这对于实际的电源丢失不是完全安全的,因为在元数据更新发生之前数据可能不会写入磁盘。如果您是偏执狂,请调整它以显式阻止,直到数据同步为止,方法是在从<code>with</code>块(在<code>write</code>块中删除之前)添加以下两行:</p>
<pre><code> tf.flush() # Flushes Python level buffers to OS
os.fsync(tf.fileno()) # Flush OS kernel buffer out to disk, block until done
</code></pre>
<hr/>
<p><strong>副本从单独行开始的情况的旧答案:</strong></p>
<p>查找第二个副本的开始位置,并截断文件:</p>
<pre><code>seen_story1 = False
with open('file.txt', 'r+') as f:
while True:
pos = f.tell() # Record position before next line
line = f.readline()
if not line:
break # Hit EOF
if line.startswith('Story1:'):
if seen_story1:
# Seen it already, we're in duplicate territory
f.seek(pos) # Go back to end of last line
f.truncate() # Truncate file
break # We're done
else:
seen_story1 = True # Seeing it for the first time
</code></pre>
<p>由于您所做的只是从文件末尾删除重复信息,因此这是安全有效的;<code>truncate</code>在大多数操作系统上应该是原子的,因此可以一次释放尾部数据,而不存在部分写入损坏等风险。你知道吗</p>