如何将字符串中的所有URL替换为`hostnametld`？

2024-09-27 23:15:22 发布

您现在位置：Python中文网/ 问答频道 /正文

3195

网友

男 | 程序猿一只，喜欢编程写python代码。

例如：

http://stackoverflow.com/questions/ask=>stackoverflowcom

下面的方法是可行的，但对于https位于url之外的角大小写就不行了

import re
from urllib.parse import urlparse

def convert_urls_to_hostnames(s):
    try:
        new_s = re.sub("http\S+", lambda match: urlparse(match.group()).hostname.replace('.','') if match.group() else urlparse(match.group()).hostname, s)
        return new_s
    except Exception as e:
        print(e)
    return s

基本上，这是可行的

s = "Ask questions here: http://stackoverflow.com/questions/ask"
print(convert_urls_to_hostnames(s))

正确返回：Ask questions here: stackoverflowcom

但是，如果在url之外的字符串中的任何位置发现http*s，则会失败，如下所示：

s = "Urls may start with http or https like so: http://stackoverflow.com/questions/ask and https://example.com/questions/"
print(convert_urls_to_hostnames(s))

返回：'NoneType' object has no attribute 'replace'

预期收益：Urls may start with http or https like so: stackoverflowcom and examplecom

Tags： to https com http convert match group stackoverflow

1条回答

网友

1楼 · 发布于 2024-09-27 23:15:22

在正则表达式中查找http://或https://，即re.sub("https?://\S+", lambda ...

如何将字符串中的所有URL替换为`hostnametld`？

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何将字符串中的所有URL替换为`hostnametld`？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >