如何使用Python将url转换到子域

2024-10-02 14:17:44 发布

您现在位置:Python中文网/ 问答频道 /正文

因此,我在urls.txt文件中有一个URL列表,其中包含URL,如下所示:

https://benetech.blogspot.com/2019/02/robin-seaman-agent-of-inclusion.html
https://nikpeachey.blogspot.com/2020/01/digital-tools-for-teachers-trainers.html
https://blogurls245.blogspot.com/

现在,我想将该url.txt的所有URL转换为子域,如下所示:

https://benetech.blogspot.com
https://nikpeachey.blogspot.com
https://blogurls245.blogspot.com

我曾尝试使用TLD模块来实现这一点,但作为Python的极端初学者,我无法理解

如果有人能帮助我通过Python完成这项工作,那就太好了


Tags: 文件httpstxtcomurl列表htmlurls
3条回答
from urllib.parse import urlparse

sample_url = 'https://benetech.blogspot.com/2019/02/robin-seaman-agent-of-inclusion.html'

parsed_url = urlparse(sample_url)
subdomain = f'{parsed_url.scheme}://{parsed_url.hostname}'

print(subdomain)

输出:

https://benetech.blogspot.com

这样做:

url = 'https://benetech.blogspot.com/2019/02/robin-seaman-agent-of-inclusion.html'

parts = url.split('/')

subdomain = parts[0] + '//' + parts[2]

subdomain将是-->https://benetech.blogspot.com

split('/')将使用/将字符串拆分为多个部分。 i、 e-->'my/name/is/Amirreza'将是-->['my','name','is','Amirreza']

使用^{} module将URL解析为其组成部分,并将其重新组合在一起,省略您不感兴趣的部分:

from urllib.parse import urlsplit, urlunsplit

url = 'https://benetech.blogspot.com/2019/02/robin-seaman-agent-of-inclusion.html'

base = urlunsplit(urlsplit(url)[:2] + ('', '', ''))
print(base)  # https://benetech.blogspot.com

相关问题 更多 >