从URL获取协议和域（无子域） - 问答 - Python中文网

从URL获取协议和域（无子域）

2024-10-02 12:24:42 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

这是Get protocol + host name from URL的扩展，添加了一个要求，即我只需要域名，而不是子域。在

比如说

Input: classes.usc.edu/xxx/yy/zz
Output: usc.edu

Input: mail.google.com
Output: google.com

Input: google.co.uk
Output: google.co.uk

对于更多的上下文，我接受用户的一个或多个种子url，然后在链接上运行一个垃圾爬虫程序。我需要域名（不带子域）来设置allowed_urls属性。在

我也看了Python urlparse -- extract domain name without subdomain，但那里的答案似乎过时了。在

我当前的代码使用urlparse，但这也得到了我不想要的子域。。。在

^{pr2}$

在python-3.x中是否有一种（希望是stdlib）方法（只获取）域？在

Tags：子域 name com host input output get google

1条回答

网友

1楼 · 发布于 2024-10-02 12:24:42

我在进行域解析时使用tldextract。在

在您的例子中，您只需要组合domain+suffix

import tldextract
tldextract.extract('mail.google.com')
Out[756]: ExtractResult(subdomain='mail', domain='google', suffix='com')
tldextract.extract('classes.usc.edu/xxx/yy/zz')
Out[757]: ExtractResult(subdomain='classes', domain='usc', suffix='edu')
tldextract.extract('google.co.uk')
Out[758]: ExtractResult(subdomain='', domain='google', suffix='co.uk')

相关问题更多 >

编程相关推荐

热门问题

热门文章