我试图从一个文本文件中得到两个重复值的列。此文本文件有1000个信息,使用以下结构:
ip,country,city,latitude,longitude
这是一个真实的例子:
179.xxx.xxx.xxx,Brazil,São Paulo,-23.3939,-46.4951
177.xxx.xxx.xxx,Brazil,Maceió,-9.5934,-35.7568
.
.
.
to the end
我想用“-23.3939”、“-46.4951”和“-9.5934”、“-35.7568”比较两个坐标是否相同,并将整行放入另一个文本文件中。所以我在Stackoverflow中找到了一些东西,它只在我使用纬度的情况下工作,但是我想获得并比较纬度和经度(原始代码):
entries = []
duplicate_entries = []
with open('in.txt', 'r') as my_file:
for line in my_file:
columns = line.strip().split(',')
if columns[2] not in entries:
entries.append(columns[2])
else:
duplicate_entries.append(columns[2])
if len(duplicate_entries) > 0:
with open('out.txt', 'w') as out_file:
with open('in.txt', 'r') as my_file:
for line in my_file:
columns = line.strip().split(',')
if columns[2] in duplicate_entries:
print line.strip()
out_file.write(line)
else:
print "No repetitions"
为了做我想做的事,我试着这样做:
entries = []
duplicate_entries = []
with open('/home/usr/python-programming/ip-infos', 'r') as arq:
for line in arq:
columns = line.strip().split(',')
if columns[3] and columns[4] not in entries:
entries.append(columns[3])
entries.append(columns[4])
else:
duplicate_entries.append(columns[3])
duplicate_entries.append(columns[4])
arq.close()
if len(duplicate_entries) > 0:
with open('/home/usr/python-programming/suspects', 'w') as
out_file:
with open('/home/usr/python-programming/ip-infos', 'r') as
arq:
for line in arq:
columns = line.strip().split(',')
if columns[3] and columns[4] in duplicate_entries:
print line.strip()
out_file.write(line)
out_file.close()
arq.close()
else:
print "No repetitions"
因此,如果我操作文本文件,这里是输出:
179.xxx.xxx.xxx,Brazil,São Paulo,-23.3939,-46.4951
177.xxx.xxx.xxx,Brazil,Maceió,-9.5934,-35.7568
>output: "No repetitions" and nothing is writed to the out_file(correct)
179.xxx.xxx.xxx,Brazil,São Paulo,-23.3939,-46.4951
177.xxx.xxx.xxx,Brazil,Maceió,-23.3939,-46.4951
>output: 179.xxx.xxx.xxx,Brazil,São Paulo,-23.3939,-46.4951
177.xxx.xxx.xxx,Brazil,Maceió,-23.3939,-46.4951
(and this two lines appear in the out_file (correct))
但如果我这么做:
179.xxx.xxx.xxx,Brazil,São Paulo,-23.3938,-46.4951
177.xxx.xxx.xxx,Brazil,Maceió,-23.3939,-46.4951
>output: 179.xxx.xxx.xxx,Brazil,São Paulo,-23.3938,-46.4951
177.xxx.xxx.xxx,Brazil,Maceió,-23.3939,-46.4951
(and this two lines appear in the out_file (incorrect))
即使经度相等,“-23.3938”也不同于“-23.3939”。所以它不应该出现在outu文件中,并且在终端中显示“无重复”。我已经试了好几个小时了,但我还在学习,我不知道该怎么做。有人能帮我吗?你知道吗
您还可以将文件中的输入放入字典。 这样你就可以一次得到副本
此时,您将有键(lat,long),值是一个列表,其中所有行的坐标相同
集合中的defaultdict可以用来代替setdefault(k,[])
这条线是你的问题:
应该是这样的:
您还需要使用
duplicate_entries
对if
条件进行相同的更改。你知道吗希望有帮助!你知道吗
相关问题 更多 >
编程相关推荐