删除双重文件内容

2024-09-26 22:52:19 发布

您现在位置:Python中文网/ 问答频道 /正文

我以前用python编写过一个文件,在第二次尝试运行脚本时,我编写了两次相同的内容。你知道吗

以下是我的文件内容:

Story1: A short story is a piece of prose fiction that typically can be read in one sitting and focuses on a self-contained incident or series of linked incidents, with the intent of evoking a "single effect" or mood, however there are many exceptions to this. A dictionary definition is "an invented prose narrative shorter than a novel usually dealing with a few characters and aiming at unity of effect and often concentrating on the creation of mood rather than plot. Story1: A short story is a piece of prose fiction that typically can be read in one sitting and focuses on a self-contained incident or series of linked incidents, with the intent of evoking a "single effect" or mood, however there are many exceptions to this. A dictionary definition is "an invented prose narrative shorter than a novel usually dealing with a few characters and aiming at unity of effect and often concentrating on the creation of mood rather than plot.

我使用的python Set操作符是这样的,但这不适用于我的情况:

uniqlines = set(open('file.txt').readlines())
bar = open('file', 'w').writelines(set(uniqlines))

在我的例子中,现在有了换行符,所以所有的内容都被读取一次。我想能够删除的内容后,故事1:是遇到第二次。 我该怎么做?你知道吗


Tags: orand文件ofthe内容ison
2条回答

你可以使用find方法。你知道吗

# set the word you want to look for
myword = "Story1"

#read the file into a variable called text
with open('file.txt', 'r+') as fin:
    text = fin.read()

#find your word for the first time. This method returns  the lowest index of the substring if it is found.
# That's why we add the length of the word we are looking for.
index_first_time_found = text.find(myword) + len(myword)

# We search again, but now we start looking from the index of our previous result.
index_second_time_found = text.find(myword, index_first_time_found)

# We cut of everything upto the index of our second index.
new_text = text[:index_second_time_found]

print(new_text)

更新:由于您不需要换行来拆分文件,因此您最好只对文件进行slurp操作,适当地拆分,然后编写一个新文件。简单的解决方案是:

import os, tempfile

with open('file.txt') as f,\
     tempfile.NamedTemporaryFile('w', dir='.', delete=False) as tf:
    # You've got a space only before second copy, so it's a useful partition point
    firstcopy, _, _ f.read().partition(' Story1: ')
    # Write first copy
    tf.write(firstcopy)
# Exiting with block closes temporary file so data is there
# Atomically replace original file with rewritten temporary file
os.replace(tf.name, 'file.txt')

从技术上讲,这对于实际的电源丢失不是完全安全的,因为在元数据更新发生之前数据可能不会写入磁盘。如果您是偏执狂,请调整它以显式阻止,直到数据同步为止,方法是在从with块(在write块中删除之前)添加以下两行:

    tf.flush()  # Flushes Python level buffers to OS
    os.fsync(tf.fileno())  # Flush OS kernel buffer out to disk, block until done

副本从单独行开始的情况的旧答案:

查找第二个副本的开始位置,并截断文件:

seen_story1 = False
with open('file.txt', 'r+') as f:
    while True:
        pos = f.tell() # Record position before next line

        line = f.readline()
        if not line:
            break  # Hit EOF

        if line.startswith('Story1:'):
            if seen_story1:
                # Seen it already, we're in duplicate territory
                f.seek(pos)   # Go back to end of last line
                f.truncate()  # Truncate file
                break         # We're done
            else:
                seen_story1 = True  # Seeing it for the first time

由于您所做的只是从文件末尾删除重复信息,因此这是安全有效的;truncate在大多数操作系统上应该是原子的,因此可以一次释放尾部数据,而不存在部分写入损坏等风险。你知道吗

相关问题 更多 >

    热门问题