python:提取不同列表的项并将它们放在一个s中

2024-07-04 05:26:37 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这样一个文件:

93.93.203.11|["['vmit.it', 'umbertominnella.it', 'studioguizzardi.it', 'telestreet.it', 'maurominnella.com']"]
168.144.9.16|["['iipmalumni.com','webdesignhostingindia.com', 'iipmstudents.in', 'iipmclubs.in']"]
195.211.72.88|["['tcmpraktijk-jingshen.nl', 'ellen-siemer.nl'']"]
129.35.210.118|["['israelinnovation.co.il', 'watec-peru.com', 'bsacimeeting.org', 'wsava2015.com', 'picsmeeting.com']"]

我想提取所有列表中的域并将它们添加到一个集合中。最终,我想有一个每一个独特的领域在一行罚款。以下是我编写的代码:

set_d = set()
f = open(file,'r')
for line in f:
    line = line.strip('\n')
    ip,list = line.split('|')
    l = json.loads(list)
    for e in l:
        domain = e.split(',')
        set_d.add(domain)
        print set_d

但它给出了以下错误:

    set_d.add(domain)
TypeError: unhashable type: 'list'

有人能帮我吗?你知道吗


Tags: 文件incomaddfordomainnlline
3条回答

使用str.translate公司要清除文本并使用update添加到集合,请执行以下操作:

set_d = set()
with open(file,'r') as f:
    for line in f:
       lst = (x.strip() for x in line.split("|")[1].translate(None,"\"'[]").split(","
        set_d.update(lst)

输出一组唯一的单个域:

set(['vmit.it', 'tcmpraktijk-jingshen.nl', 'umbertominnella.it', 'studioguizzardi.it', 'telestreet.it', 'watec-peru.com', 'bsacimeeting.org', 'webdesignhostingindia.com', 'wsava2015.com', 'iipmstudents.in', 'maurominnella.com', 'ellen-siemer.nl', 'picsmeeting.com', 'iipmalumni.com', 'iipmclubs.in', 'israelinnovation.co.il'])

您可以将其写入新文件:

set_d = set()
with open(file,'r') as f,open("out.txt","w") as out:
    for line in f:
        lst = (x.strip() for x in line.split("|")[1].translate(None,"\"'[]").split(","))
        set_d.update(lst)
    for line in set_d:
        out.write("{}\n".format(line))

输出:

$ cat out.txt 
vmit.it
tcmpraktijk-jingshen.nl
umbertominnella.it
studioguizzardi.it
telestreet.it
watec-peru.com
bsacimeeting.org
webdesignhostingindia.com
wsava2015.com
iipmstudents.in
maurominnella.com
ellen-siemer.nl
picsmeeting.com
iipmalumni.com
iipmclubs.in
israelinnovation.co.il

您的代码不会分离到单独的域中,您的json调用实际上没有任何帮助。将代码更改为update将输出如下内容:

{" 'maurominnella.com']", " 'wsava2015.com'", "'webdesignhostingindia.com'", " 'iipmclubs.in']", " 'ellen-siemer.nl'']", " 'umbertominnella.it'", " 'picsmeeting.com']", "['israelinnovation.co.il'", "['vmit.it'", " 'iipmstudents.in'", "['tcmpraktijk-jingshen.nl'", " 'studioguizzardi.it'", "['iipmalumni.com'", " 'watec-peru.com'", " 'bsacimeeting.org'", " 'telestreet.it'"}

也不要使用list作为变量名,因为它会隐藏pythonlist

您应该调用update,而不是add

set_d.update(domain)

举例说明

>>> set_d = {'a', 'b', 'c'}
>>> set_d.update(['c', 'd', 'e'])
>>> print set_d
{'a', 'b', 'c', 'd', 'e'}

由于split函数的结果是一个列表(domain = e.split(',')),并且列表不可修改,因此不能将它们添加到^{}。相反,您可以使用^{}将这些元素添加到集合中,但您不需要Json,因为它不会分隔域,也不会给出所需的结果,相反,您可以使用ast.literal_eval来拆分列表:

import ast
set_d = set()
f = open(file,'r')
for line in f:
    line = line.strip('\n')
    ip,li = line.split('|')
    l = ast.literal_eval(ast.literal_eval(li)[0])
    for e in l:
        domain = e.split(',')
        set_d.update(domain)
    print set_d

请注意,不要使用python内置函数或类型作为变量!你知道吗

作为一种更有效的方法,您可以使用regex对域进行grub:

f = open(file,'r').read()
import re
print set(re.findall(r'[a-zA-Z\-]+\.[a-zA-Z]+',f))

结果:

set(['vmit.it', 'tcmpraktijk-jingshen.nl', 'umbertominnella.it', 'studioguizzardi.it', 'telestreet.it', 'israelinnovation.co', 'bsacimeeting.org', 'webdesignhostingindia.com', 'iipmstudents.in', 'maurominnella.com', 'ellen-siemer.nl', 'picsmeeting.com', 'watec-peru.com', 'iipmalumni.com', 'iipmclubs.in'])
[Finished in 0.0s]

相关问题 更多 >

    热门问题