如何操作URL字符串以提取单个片段？

def get_username_from_url(url): if url.startswith(r'http://www.'): user = url.replace(r'http://www.', '', 1) user = user.split('.')[0] return user elif url.startswith(r'http://'): user = url.replace(r'http://', '', 1) user = user.split('.')[0] return user easy_url = "http://www.httpwwwweirdusername.com/" hard_url = "http://httpwwwweirdusername.blogger.com/" print get_username_from_url(easy_url) # output = httpwwwweirdusername (good! expected.) print get_username_from_url(hard_url) # output = weirdusername (bad! username should = httpwwwweirdusername)

2条回答

网友

1楼 · 编辑于 2024-09-30 22:15:56

有一个名为^{}的模块专门用于此任务：

>>> from urlparse import urlparse
>>> url = "http://httpwwwweirdusername.blogger.com/"
>>> urlparse(url).hostname.split('.')[0]
'httpwwwweirdusername'

在http://www.httpwwwweirdusername.com/的情况下，它将输出不需要的www。有一些解决方法可以忽略www部分，例如，从分割的hostname中获取不等于www的第一个项：

>>> from urlparse import urlparse

>>> url = "http://www.httpwwwweirdusername.com/"
>>> next(item for item in urlparse(url).hostname.split('.') if item != 'www')
'httpwwwweirdusername'

>>> url = "http://httpwwwweirdusername.blogger.com/"
>>> next(item for item in urlparse(url).hostname.split('.') if item != 'www')
'httpwwwweirdusername'

网友

2楼 · 编辑于 2024-09-30 22:15:56

使用正则表达式可以做到这一点（可能会修改regex使其更精确/更高效）。你知道吗

import re
url_pattern = re.compile(r'.*/(?:www.)?(\w+)')
def get_username_from_url(url):
    match = re.match(url_pattern, url)
    if match:
        return match.group(1)

easy_url = "http://www.httpwwwweirdusername.com/"
hard_url = "http://httpwwwweirdusername.blogger.com/"

print get_username_from_url(easy_url)
print get_username_from_url(hard_url)

这就产生了：

httpwwwweirdusername
httpwwwweirdusername

相关问题更多 >

编程相关推荐

热门问题

热门文章