如何将字符串中的所有URL替换为`hostnametld`?

2024-09-27 23:15:22 发布

您现在位置:Python中文网/ 问答频道 /正文

例如:

http://stackoverflow.com/questions/ask=>stackoverflowcom

下面的方法是可行的,但对于https位于url之外的角大小写就不行了

import re
from urllib.parse import urlparse

def convert_urls_to_hostnames(s):
    try:
        new_s = re.sub("http\S+", lambda match: urlparse(match.group()).hostname.replace('.','') if match.group() else urlparse(match.group()).hostname, s)
        return new_s
    except Exception as e:
        print(e)
    return s

基本上,这是可行的

s = "Ask questions here: http://stackoverflow.com/questions/ask"
print(convert_urls_to_hostnames(s))

正确返回:Ask questions here: stackoverflowcom

但是,如果在url之外的字符串中的任何位置发现http*s,则会失败,如下所示:

s = "Urls may start with http or https like so: http://stackoverflow.com/questions/ask and https://example.com/questions/"
print(convert_urls_to_hostnames(s))

返回:'NoneType' object has no attribute 'replace'

预期收益:Urls may start with http or https like so: stackoverflowcom and examplecom


Tags: tohttpscomhttpconvertmatchgroupstackoverflow

热门问题