删除双重文件内容

2条回答

网友

1楼 · 编辑于 2024-09-26 22:52:19

你可以使用find方法。你知道吗

# set the word you want to look for
myword = "Story1"

#read the file into a variable called text
with open('file.txt', 'r+') as fin:
    text = fin.read()

#find your word for the first time. This method returns  the lowest index of the substring if it is found.
# That's why we add the length of the word we are looking for.
index_first_time_found = text.find(myword) + len(myword)

# We search again, but now we start looking from the index of our previous result.
index_second_time_found = text.find(myword, index_first_time_found)

# We cut of everything upto the index of our second index.
new_text = text[:index_second_time_found]

print(new_text)

网友

2楼 · 编辑于 2024-09-26 22:52:19

更新：由于您不需要换行来拆分文件，因此您最好只对文件进行slurp操作，适当地拆分，然后编写一个新文件。简单的解决方案是：

import os, tempfile

with open('file.txt') as f,\
     tempfile.NamedTemporaryFile('w', dir='.', delete=False) as tf:
    # You've got a space only before second copy, so it's a useful partition point
    firstcopy, _, _ f.read().partition(' Story1: ')
    # Write first copy
    tf.write(firstcopy)
# Exiting with block closes temporary file so data is there
# Atomically replace original file with rewritten temporary file
os.replace(tf.name, 'file.txt')

从技术上讲，这对于实际的电源丢失不是完全安全的，因为在元数据更新发生之前数据可能不会写入磁盘。如果您是偏执狂，请调整它以显式阻止，直到数据同步为止，方法是在从with块（在write块中删除之前）添加以下两行：

    tf.flush()  # Flushes Python level buffers to OS
    os.fsync(tf.fileno())  # Flush OS kernel buffer out to disk, block until done

副本从单独行开始的情况的旧答案：

查找第二个副本的开始位置，并截断文件：

seen_story1 = False
with open('file.txt', 'r+') as f:
    while True:
        pos = f.tell() # Record position before next line

        line = f.readline()
        if not line:
            break  # Hit EOF

        if line.startswith('Story1:'):
            if seen_story1:
                # Seen it already, we're in duplicate territory
                f.seek(pos)   # Go back to end of last line
                f.truncate()  # Truncate file
                break         # We're done
            else:
                seen_story1 = True  # Seeing it for the first time

由于您所做的只是从文件末尾删除重复信息，因此这是安全有效的；truncate在大多数操作系统上应该是原子的，因此可以一次释放尾部数据，而不存在部分写入损坏等风险。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章

删除双重文件内容

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >