用于捕获科学引文的正则表达式

2条回答

网友

1楼 · 编辑于 2024-09-30 18:16:11

你可以用

re.findall(r'\([^()\d]*\d[^()]*\)', s)

参见regex demo

细节

\(-a(字符
[^()\d]*-0个或更多字符，而不是(、)和数字
\d-一个数字
[^()]*-0个或更多字符，而不是(，)
\)-a)字符。你知道吗

参见regex graph：

Python demo：

import re
rx = re.compile(r"\([^()\d]*\d[^()]*\)")
s = "Some (Author) and (Author 2000)"
print(rx.findall(s)) # => ['(Author 2000)']

要获得不带括号的结果，请添加捕获组：

rx = re.compile(r"\(([^()\d]*\d[^()]*)\)")
                    ^                ^

见this Python demo。你知道吗

网友

2楼 · 编辑于 2024-09-30 18:16:11

处理此表达式最可靠的方法可能是在表达式可能增长时添加边界。例如，我们可以尝试创建char列表，从中收集所需的数据：

(?=\().([a-z]+)([\s,;]+?)([0-9]+)(?=\)).

DEMO

测试

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(?=\().([a-z]+)([\s,;]+?)([0-9]+)(?=\))."

test_str = "some text we wish before (Author) some text we wish after (Author 2000) some text we wish before (Author) some text we wish after (Author, 2000) some text we wish before (Author) some text we wish after (Author 2000) some text we wish before (Author) some text we wish after (Author; 2000)"

matches = re.finditer(regex, test_str, re.MULTILINE | re.IGNORECASE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

演示

；；

；

正则表达式电路

jex.im可视化正则表达式：

DEMO

测试

演示

正则表达式电路

相关问题更多 >

编程相关推荐

热门问题

热门文章

用于捕获科学引文的正则表达式

DEMO

测试

演示

正则表达式电路

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >