提取括号中字符串的内容

3条回答

网友

1楼 · 编辑于 2024-05-19 13:33:05

一个比其他人更明确的回答，我认为它符合你的需要：

import re
regex = re.compile(r'([a-zA-Z]+ [a-zA-Z]+) \(([a-zA-Z]+ [a-zA-Z]+)\)')
actor_character = regex.findall(string)

我承认这有点难看，但就像我说的那样。

网友

2楼 · 编辑于 2024-05-19 13:33:05

string = "Will Ferrell (Nick Halsey), Rebecca Hall (Samantha), Michael Pena (Frank Garcia)"

import re
pat = re.compile(r'([^(]+)\s*\(([^)]+)\)\s*(?:,\s*|$)')

lst = [(t[0].strip(), t[1].strip()) for t in pat.findall(string)]

编译的模式有点棘手。这是一条生硬的线，让反斜杠不那么疯狂。意思是：启动一个匹配组；匹配任何不是“（”字符的内容，只要它至少是一次，就可以匹配任意次数；关闭匹配组；匹配一个文本“（”字符；启动另一个匹配组；匹配任何不是“）”字符的内容，只要它至少是一次，就可以匹配任意次数；关闭匹配组；匹配一个文本“）”字符；然后匹配任何空格（包括无空格）；然后是非常棘手的事情。真正棘手的部分是不构成匹配组的分组。它不是以“（”开头，而是以“（？”结尾？：“然后再以“）”结尾。我使用了这个分组，这样我就可以在其中放置一个竖线来允许两种不同的模式：要么是逗号匹配，后跟任意数量的空格，要么是到达了行的末尾（'$'字符）。

然后我使用pat.findall()查找模式匹配的string中的所有位置；它自动返回元组。我把它放在一个列表理解中，并对每个项目调用.strip()，以清除空白。

当然，我们可以使正则表达式更加复杂，并让它返回已经去掉空白的名称。不过，正则表达式会变得非常毛茸茸的，所以我们将使用Python正则表达式中最酷的特性之一：“verbose”模式，在这种模式下，您可以将一个模式扩展到许多行，并根据需要放置注释。我们使用的是原始的三引号字符串，因此反斜杠很方便，多行也很方便。给你：

import re
s_pat = r'''
\s*  # any amount of white space
([^( \t]  # start match group; match one char that is not a '(' or space or tab
[^(]*  # match any number of non '(' characters
[^( \t])  # match one char that is not a '(' or space or tab; close match group
\s*  # any amount of white space
\(  # match an actual required '(' char (not in any match group)
\s*  # any amount of white space
([^) \t]  # start match group; match one char that is not a ')' or space or tab
[^)]*  # match any number of non ')' characters
[^) \t])  # match one char that is not a ')' or space or tab; close match group
\s*  # any amount of white space
\) # match an actual required ')' char (not in any match group)
\s*  # any amount of white space
(?:,|$)  # non-match group: either a comma or the end of a line
'''
pat = re.compile(s_pat, re.VERBOSE)

lst = pat.findall(string)

伙计，那真的不值得你这么做。

此外，上面的内容保留了名称中的空白。您可以很容易地规范化空白，以确保它是100%一致的，通过拆分空白和重新加入空格。

string = '  Will   Ferrell  ( Nick\tHalsey ) , Rebecca Hall (Samantha), Michael\fPena (Frank Garcia)'

import re
pat = re.compile(r'([^(]+)\s*\(([^)]+)\)\s*(?:,\s*|$)')

def nws(s):
    """normalize white space.  Replaces all runs of white space by a single space."""
    return " ".join(w for w in s.split())

lst = [tuple(nws(item) for item in t) for t in pat.findall(string)]

print lst # prints: [('Will Ferrell', 'Nick Halsey'), ('Rebecca Hall', 'Samantha'), ('Michael Pena', 'Frank Garcia')]

现在string有愚蠢的空白：多个空格、一个制表符，甚至还有一个表单提要（“\f”）。上面的代码将其清除，以便名称由一个空格分隔。

网友

3楼 · 编辑于 2024-05-19 13:33:05

正则表达式的好地方：

>>> import re
>>> pat = "([^,\(]*)\((.*?)\)"
>>> re.findall(pat, "Will Ferrell (Nick Halsey), Rebecca Hall (Samantha), Michael Pena (Frank Garcia)")
[('Will Ferrell ', 'Nick Halsey'), (' Rebecca Hall ', 'Samantha'), (' Michael Pena ', 'Frank Garcia')]

相关问题更多 >

编程相关推荐

热门问题

热门文章