python regex无法识别降价链接

import re text = ''' [Vocoder](http://en.wikipedia.org/wiki/Vocoder ) [Turing]( http://en.wikipedia.org/wiki/Alan_Turing) [Autotune](http://en.wikipedia.org/wiki/Autotune) http://en.wikipedia.org/wiki/The_Voder ''' urls = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*,]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', text) #find all urls for url in urls: url = re.escape(url) link_exp = re.compile('\[.*\]$\s*{0}\s*$'.format(url) ) # expression with url wrapped in link syntax. search = re.search(link_exp, text) if search != None: print url # expression should translate to: # \[ - literal [ # .* - any character or no character # \] - literal ] # $ - literal ( # \s* - whitespaces or no whitespace # {0} - the url # \s* - whitespaces or no whitespace # $ - literal ) # NOTE: I am including whitespaces to encompass cases like [foo]( http://www.foo.sexy )

1条回答

网友

1楼 · 发布于 2024-05-08 02:40:16

这里的问题是您的正则表达式首先用于拉出URL，这是在URL中包含)。这意味着您要查找两次右括号。这一切都会发生，除了第一个（空间节省你那里）。在

我不太确定URL regex的每一部分都在尝试做什么，但是这部分内容说明： [$-_@.&+]，包含从$（ascii36）到{}（ascii137）的范围，其中包含了大量您可能不是指的字符，包括)。在

与其查找url，然后检查它们是否在链接中，为什么不同时执行这两项操作呢？这样，您的URL regex可以更懒惰，因为额外的约束使它不太可能是其他的：

# Anything that isn't a square closing bracket
name_regex = "[^]]+"
# http:// or https:// followed by anything but a closing paren
url_regex = "http[s]?://[^)]+"

markup_regex = '\[({0})]\(\s*({1})\s*\)'.format(name_regex, url_regex)

for match in re.findall(markup_regex, text):
    print match

结果：

^{pr2}$

如果需要更严格的话，可以改进URL regex。在

相关问题更多 >

编程相关推荐

热门问题

热门文章