例如:
http://stackoverflow.com/questions/ask
=>
stackoverflowcom
下面的方法是可行的,但对于https
位于url之外的角大小写就不行了
import re
from urllib.parse import urlparse
def convert_urls_to_hostnames(s):
try:
new_s = re.sub("http\S+", lambda match: urlparse(match.group()).hostname.replace('.','') if match.group() else urlparse(match.group()).hostname, s)
return new_s
except Exception as e:
print(e)
return s
基本上,这是可行的
s = "Ask questions here: http://stackoverflow.com/questions/ask"
print(convert_urls_to_hostnames(s))
正确返回:Ask questions here: stackoverflowcom
但是,如果在url之外的字符串中的任何位置发现http*s
,则会失败,如下所示:
s = "Urls may start with http or https like so: http://stackoverflow.com/questions/ask and https://example.com/questions/"
print(convert_urls_to_hostnames(s))
返回:'NoneType' object has no attribute 'replace'
预期收益:Urls may start with http or https like so: stackoverflowcom and examplecom
在正则表达式中查找
http://
或https://
,即re.sub("https?://\S+", lambda ...
相关问题 更多 >
编程相关推荐