如何使用正则表达式提取降价链接？

import re # Extract []() style links link_name = "[^]]+" link_url = "http[s]?://[^)]+" markup_regex = f'\[({link_name})]\(\s*({link_url})\s*\)' for match in re.findall(markup_regex, '[a link](https://www.wiki.com/atopic_(subtopic))'): name = match[0] url = match[1] print(url) # url will be https://wiki.com/atopic_(subtopic

2条回答

网友

1楼 · 编辑于 2024-10-01 05:00:30

我认为您需要区分什么是markdown中的有效链接，以及（可选）什么是有效url。例如，标记中的有效链接也可以是相对路径，URL可能有也可能没有“http（s）”或“www”部分

只要使用link_url = "http[s]?://.+"甚至link_url = ".*"，代码就已经可以工作了。它将解决URL以括号结尾的问题，并且只意味着您依赖markdown结构[]()来查找链接。验证URL是一个完全不同的讨论：How do you validate a URL with a regular expression in Python?

示例代码修复程序：

import re

# Extract []() style links
link_name = "[^\[]+"
link_url = "http[s]?://.+"
markup_regex = f'\[({link_name})]\(\s*({link_url})\s*\)'

for match in re.findall(markup_regex, '[a link](https://www.wiki.com/atopic_(subtopic))'):
    name = match[0]
    url = match[1]
    print(url)
    # url will be https://wiki.com/atopic_(subtopic)

注意，我还调整了link_name，以防止标记文本中某个地方的单个“[”出现问题

网友

2楼 · 编辑于 2024-10-01 05:00:30

对于这些类型的URL，您需要一种只有较新的regex模块支持的递归方法：

import regex as re

data = """
It's very easy to make some words **bold** and other words *italic* with Markdown. 
You can even [link to Google!](http://google.com)
[a link](https://www.wiki.com/atopic_(subtopic))
"""

pattern = re.compile(r'\[([^][]+)\](\(((?:[^()]+|(?2))+)\))')

for match in pattern.finditer(data):
    description, _, url = match.groups()
    print(f"{description}: {url}")

这就产生了

link to Google!: http://google.com
a link: https://www.wiki.com/atopic_(subtopic)

见a demo on regex101.com

这种神秘的小美可以归结为

\[([^][]+)\]           # capture anything between "[" and "]" into group 1
(\(                    # open group 2 and match "("
    ((?:[^()]+|(?2))+) # match anything not "(" nor ")" or recurse group 2
                       # capture the content into group 3 (the url)
\))                    # match ")" and close group 2

注意：这种方法的问题是，它无法用于URL等

[some nasty description](https://google.com/()
#                                          ^^^

这在降价中肯定是完全有效的。如果您要遇到任何这样的URL，请使用适当的解析器

相关问题更多 >

编程相关推荐

热门问题

热门文章