按子字符串检查列表中是否有元素

2条回答

网友

1楼 · 编辑于 2024-09-30 06:16:50

您可以尝试添加另一个for循环，如果您不介意的话。比如：

for url in list:  
    for i in range(len(list)):  
      if url[:30] not in list[i]:  
          print(url)

将每个单词与其他单词进行比较，以检查是否相同。这只是一个例子，我相信你可以让它更强大。你知道吗

网友

2楼 · 编辑于 2024-09-30 06:16:50

如果您认为任何netloc都是相同的，那么可以使用^{}进行解析

from urllib.parse import  urlparse # python2 from urlparse import  urlparse 

u = "http://www.myurlnumber1.com/foo+%bar%baz%qux"

print(urlparse(u).netloc)

这会给你：

www.myurlnumber1.com

因此，要获得独特的Netloc，您可以执行以下操作：

unique  = {urlparse(u).netloc for u in urls}

如果要保留url方案：

urls  = ["http://www.myurlnumber1.com/foo+%bar%baz%qux", "http://www.myurlnumber1.com"]

unique = {"{}://{}".format(u.scheme, u.netloc) for u in map(urlparse, urls)}
print(unique)

假设它们都有方案，而您没有相同netloc的http和https，并认为它们是相同的。你知道吗

如果还要添加路径：

unique = {u.netloc, u.path) for u in map(urlparse, urls)}

文档中列出了属性表：

Attribute   Index   Value   Value if not present
scheme  0   URL scheme specifier    scheme parameter
netloc  1   Network location part   empty string
path    2   Hierarchical path   empty string
params  3   Parameters for last path element    empty string
query   4   Query component empty string
fragment    5   Fragment identifier empty string
username        User name   None
password        Password    None
hostname        Host name (lower case)  None
port        Port number as integer, if present  None

你只需要使用你认为独特的部分。你知道吗

In [1]: from urllib.parse import  urlparse

In [2]: urls = ["http://www.url.com/foo-bar", "http://www.url.com/foo-bar?t=baz", "www.url.com/baz-qux",  "www.url.com/foo-bar?t=baz"]


In [3]: unique = {"".join((u.netloc, u.path)) for u in map(urlparse, urls)}

In [4]: 

In [4]: print(unique)
{'www.url.com/baz-qux', 'www.url.com/foo-bar'}

相关问题更多 >

编程相关推荐

热门问题

热门文章

按子字符串检查列表中是否有元素

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >