使用awk/python从CSV文件中删除重复行

2024-07-04 08:55:54 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个文件1.csv,其中的行如下:

adx,999-99-7708
ada,999-99-8101
ad1,999-99-8342
zda,103-54-7013
ad1,999-99-8591

file2.csv具有如下行:

1967-05-08,583-50-3833,Trac,Mich,Ewell,3000,Cumming,3830 Man Rid Driv,tracey@gmail.com,(111) 123-4567,0123,GA,339061
1988-03-27,103-54-7013,Mar,Grac,Vea,30004,Au,2549 Walt Wa Apt D1,m@augu.edu,(706) 916-4817,021341,GA,339060
1973-11-16,183-54-5013,Carl,,Thom,30093,Norcross,1021 Ri Rid Drive,,,,,339059

想要的输出:

1967-05-08,583-50-3833,Trac,Mich,Ewell,3000,Cumming,3830 Man Rid Driv,tracey@gmail.com,(111) 123-4567,0123,GA,339061
1973-11-16,183-54-5013,Carl,,Thom,30093,Norcross,1021 Ri Rid Drive,,,,,339059 

我试过了

awk -F, 'NR==FNR{a[$2]++; next} !a[$2]{print}' file1.txt file2.txt 

它检查file1.txt中的file2.txt副本并删除它们,但仍然获得输出

1967-05-08,583-50-3833,Trac,Mich,Ewell,3000,Cumming,3830 Man Rid Driv,tracey@gmail.com,(111) 123-4567,0123,GA,339061
1988-03-27,103-54-7013,Mar,Grac,Vea,30004,Au,2549 Walt Wa Apt D1,m@augu.edu,(706) 916-4817,021341,GA,339060
1973-11-16,183-54-5013,Carl,,Thom,30093,Norcross,1021 Ri Rid Drive,,,,,339059

当带有103-54-7013的第#2行应移除时。我对awk做了什么错事


Tags: txtcomgmailfile2tracgamanrid
2条回答

python中使用csv模块的变量:

import csv

with open("file1.csv") as if1:
    removals = {_id for _, _id in csv.reader(if1)}

with open("file2.csv") as if2:
    data = csv.reader(if2)
    
    with open("file2.csv", 'w') as of2:
        writer = csv.writer(of2).writerows(row for row in data if row[1] not in removals)

此awk的工作原理是:

awk -F, 'NR==FNR{seen[$2]; next} !($2 in seen)' f1 f2 

印刷品:

1967-05-08,583-50-3833,Trac,Mich,Ewell,3000,Cumming,3830 Man Rid Driv,tracey@gmail.com,(111) 123-4567,0123,GA,339061
1973-11-16,183-54-5013,Carl,,Thom,30093,Norcross,1021 Ri Rid Drive,,,,,339059

您的awk似乎也能正常工作。您确定它是简单的csv,分隔符中没有空格吗

相关问题 更多 >

    热门问题