python上引号后的Regex表达式

2024-05-03 05:06:30 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试开发一个Python程序,从Pandora的twit中获得艺术家的名字。例如,如果我有一个twitter:

I'm listening to "I Can Make It Better" by Luther Vandross on Pandora #pandora http://t.co/ieDbLC393F.

我只想取回卢瑟·范德罗斯这个名字。我对regex了解不多,因此我尝试执行以下代码:

print  re.findall('".+?" by [\w+]+',  text)    

但结果是路德的“我能让它变得更好”

你知道我如何在python上开发正则表达式来获得它吗?你知道吗


Tags: to程序makebyittwitter名字can
3条回答

您需要使用捕获组。你知道吗

print re.findall(r'"[^"]*" by ([A-Z][a-z]+(?: [A-Z][a-z]+){0,2})',  text)  

我使用了重复量词,因为名字可能只包含名字或名字、姓氏或名字、中间名、姓氏。你知道吗

>>> s = '''I'm listening to "I Can Make It Better" by Luther Vandross on Pandora #pandora http://t.co/ieDbLC393F.'''

>>> import re
>>> m = re.search('to "?(.*?)"? by (.*?) on #?Pandora', s)
>>> m
<_sre.SRE_Match object; span=(14, 69), match='to "I Can Make It Better" by Luther Vandross on P>
>>> m.groups()
('I Can Make It Better', 'Luther Vandross')

更多测试用例:

>>> tests = [
    '''I'm listening to "Don't Turn Out The Lights (D.T.O.T.L.)" by NKOTBSB on #Pandora''',
    '''I'm listening to G.O.D. Remix by Canton Jones on #Pandora''',
    '''I'm listening to "It's Been Awhile" by @staindmusic on Pandora #pandora http://pdora.co/R1OdxE''',
    '''I'm listening to "Everlong" by @foofighters on #Pandora http://pdora.co/1eANfI0''',
    '''I'm listening to "El Preso (2000)" by Fruko Y Sus Tesos on #Pandora http://pdora.co/1GtOHC1'''
    '''I'm listening to "Cat Daddy" by Rej3ctz on #Pandora http://pdora.co/1eALNpc''',
    '''I'm listening to "Space Age Pimpin'" by 8 Ball & MJG on Pandora #pandora http://pdora.co/1h8swun'''
]
>>> expr = re.compile('to "?(.*?)"? by (.*?) on #?Pandora')
>>> for s in tests:
        print(expr.search(s).groups())

("Don't Turn Out The Lights (D.T.O.T.L.)", 'NKOTBSB')
('G.O.D. Remix', 'Canton Jones')
("It's Been Awhile", '@staindmusic')
('Everlong', '@foofighters')
('El Preso (2000)', 'Fruko Y Sus Tesos')
("Space Age Pimpin'", '8 Ball & MJG')

您的正则表达式很接近,但是您可以将分隔符更改为使用" byon。但是,您需要使用括号来捕获组。你知道吗

您可以这样使用正则表达式:

" by (.+?) on

Working demo

Regular expression visualization

这个正则表达式背后的思想是使用一个简单的非reedy正则表达式捕获" byon之间的内容。你知道吗

匹配信息

MATCH 1
1.  [43-58] `Luther Vandross`

代码

import re
p = re.compile(ur'" by (.+?) on')
test_str = u"I'm listening to \"I Can Make It Better\" by Luther Vandross on Pandora #pandora http://t.co/ieDbLC393F.\n"

re.search(p, test_str)

相关问题 更多 >