Python,在一个有逗号分隔值的文件中,如何检查行之间的重复值并删除重复的行?

2024-06-02 22:13:35 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个txt文件,格式如下:

  • 01,西班牙
  • 02,美国
  • 03,印度
  • 01,意大利
  • 01,葡萄牙
  • 04,巴西

我需要检查这些数字是否重复,比如在这个例子中,“01”代表西班牙、意大利和葡萄牙。如果两行或多行有相同的数字,我只需要保留重复数字的第一个幻影,并去掉另一个幽灵。它将在文件中显示:

  • 01,西班牙
  • 02,美国
  • 03,印度
  • 04,巴西

Tags: 文件txt格式代表数字例子幽灵幻影
3条回答
import os
with open("file.txt", "r") as infile:
    numbers = set()
    f = open("_file.txt", "w")
    for line in infile:
        tokens = line.split(',')
        if int(tokens[0]) not in numbers:
            numbers.add(int(tokens[0]))
            f.write(line)
    f.close()
os.remove("file.txt")
os.rename("_file.txt", "file.txt")
# Read your entire file into memory.
my_file = 'my_file.txt'
with open(my_file) as f_in:
    content = f_in.readlines()

# Keep track of the numbers that have already appeared
# while rewriting the content back to your file.
numbers = []
with open(my_file, 'w') as f_out:
    for line in content:
        number, country = line.split(',')
        if not number in numbers:
            f_out.write(line)
            numbers.append(number)

我希望这是最容易理解的。在

import sets
seen = sets.Set()
with open('in.txt', 'r'), open('out.txt', 'w') as fr, fw:
    for line in fr:
        row = line.split(',')
        if row[0] not in seen:
            fw.write(line)
            seen.add(row[0])

相关问题 更多 >