如何在Python中正确地从txt文件中捕获两个重复值的列?

2024-07-01 08:16:47 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图从一个文本文件中得到两个重复值的列。此文本文件有1000个信息,使用以下结构:

ip,country,city,latitude,longitude

这是一个真实的例子:

179.xxx.xxx.xxx,Brazil,São Paulo,-23.3939,-46.4951
177.xxx.xxx.xxx,Brazil,Maceió,-9.5934,-35.7568
.
.
.
to the end

我想用“-23.3939”、“-46.4951”和“-9.5934”、“-35.7568”比较两个坐标是否相同,并将整行放入另一个文本文件中。所以我在Stackoverflow中找到了一些东西,它只在我使用纬度的情况下工作,但是我想获得并比较纬度和经度(原始代码):

entries = []
duplicate_entries = []
with open('in.txt', 'r') as my_file:
for line in my_file:
    columns = line.strip().split(',')
    if columns[2] not in entries:
        entries.append(columns[2])
    else:
        duplicate_entries.append(columns[2]) 

if len(duplicate_entries) > 0:
with open('out.txt', 'w') as out_file:
    with open('in.txt', 'r') as my_file:
        for line in my_file:
            columns = line.strip().split(',')
            if columns[2] in duplicate_entries:
                print line.strip()
                out_file.write(line)
else:
print "No repetitions"

为了做我想做的事,我试着这样做:

entries = []
duplicate_entries = []
with open('/home/usr/python-programming/ip-infos', 'r') as arq:
for line in arq:
    columns = line.strip().split(',')
    if columns[3] and columns[4] not in entries:
        entries.append(columns[3])
        entries.append(columns[4])
    else:
        duplicate_entries.append(columns[3])
        duplicate_entries.append(columns[4])
arq.close()

if len(duplicate_entries) > 0:
with open('/home/usr/python-programming/suspects', 'w') as 
out_file:
    with open('/home/usr/python-programming/ip-infos', 'r') as 
arq:
        for line in arq:
            columns = line.strip().split(',')
            if columns[3] and columns[4] in duplicate_entries:
                print line.strip()
                out_file.write(line)
        out_file.close()
        arq.close()
else:
print "No repetitions"

因此,如果我操作文本文件,这里是输出:

179.xxx.xxx.xxx,Brazil,São Paulo,-23.3939,-46.4951
177.xxx.xxx.xxx,Brazil,Maceió,-9.5934,-35.7568

>output: "No repetitions" and nothing is writed to the out_file(correct)

179.xxx.xxx.xxx,Brazil,São Paulo,-23.3939,-46.4951
177.xxx.xxx.xxx,Brazil,Maceió,-23.3939,-46.4951

>output: 179.xxx.xxx.xxx,Brazil,São Paulo,-23.3939,-46.4951
         177.xxx.xxx.xxx,Brazil,Maceió,-23.3939,-46.4951
         (and this two lines appear in the out_file (correct))

但如果我这么做:

 179.xxx.xxx.xxx,Brazil,São Paulo,-23.3938,-46.4951
 177.xxx.xxx.xxx,Brazil,Maceió,-23.3939,-46.4951

 >output: 179.xxx.xxx.xxx,Brazil,São Paulo,-23.3938,-46.4951
          177.xxx.xxx.xxx,Brazil,Maceió,-23.3939,-46.4951
          (and this two lines appear in the out_file (incorrect))

即使经度相等,“-23.3938”也不同于“-23.3939”。所以它不应该出现在outu文件中,并且在终端中显示“无重复”。我已经试了好几个小时了,但我还在学习,我不知道该怎么做。有人能帮我吗?你知道吗


Tags: columnsinaswithlineopenoutfile
2条回答

您还可以将文件中的输入放入字典。 这样你就可以一次得到副本

elements = ['179.xxx.xxx.xxx,Brazil,São Paulo,-23.3939,-46.4951',
'177.xxx.xxx.xxx,Brazil,Maceió,-9.5934,-35.7568',
'179.xxx.xxx.xxx,Brazil,São Paulo,-23.3939,-46.4951',
'177.xxx.xxx.xxx,Brazil,Maceió,-9.5934,-35.7568',
'179.xxx.xxx.xxx,Brazil,São Paulo,-23.3939,-46.4951']

uniques = {}
for line in elements:
    ip, country, city, lat, long = line.strip().split(',')
    uniques.setdefault((lat, long), []).append(line)




uniques = {('-23.3939', '-46.4951'): ['179.xxx.xxx.xxx,Brazil,São 
Paulo,-23.3939,-46.4951', '179.xxx.xxx.xxx,Brazil,São 
Paulo,-23.3939,-46.4951', '179.xxx.xxx.xxx,Brazil,São 
Paulo,-23.3939,-46.4951'],
('-9.5934', '-35.7568'): 
['177.xxx.xxx.xxx,Brazil,Maceió,-9.5934,-35.7568',
'177.xxx.xxx.xxx,Brazil,Maceió,-9.5934,-35.7568']}

此时,您将有键(lat,long),值是一个列表,其中所有行的坐标相同

with open('duplicate.txt', 'w') as duplicate:
    for coord, cities in uniques.items():
        if len(cities) == 1:
            continue
        duplicate.write('\n'.join(cities))

集合中的defaultdict可以用来代替setdefault(k,[])

这条线是你的问题:

if columns[3] and columns[4] not in entries:

应该是这样的:

if (columns[3] in entries) and (columns[4] in entries):

您还需要使用duplicate_entriesif条件进行相同的更改。你知道吗

希望有帮助!你知道吗

相关问题 更多 >

    热门问题