寻找一种局限于某一组词的模式

2024-09-30 14:16:44 发布

您现在位置:Python中文网/ 问答频道 /正文

假设我想写一个模式来捕捉这样的句子:

<person> was one of the few <profession> in <city> whom everybody admired. 

以下描述了所需的变化:

<person> is a member of {Michael, Jack, Joe, Maria, Susan}.
<profession> is any of {painters, actors}.
<city> is the regexp pattern `[$k|K]a\w+`.

所以,句型应该包括这样的句子:

Jack was one of the few painters in Kansan whom everybody admired. 
Michael was one of the few actors in Karlsruhe whom everybody admired.

如何用Python建模?据我所知,单靠regex无法捕获这样的模式。我也许可以编写上下文无关的语法,但在走这条路之前,我想我可以问这个问题,看看是否有更简单的方法。你知道吗


Tags: oftheincityis模式one句子
3条回答

可能你想要这样的东西:

/^(Michael|Susan|Maria|Jack|Joe).*?(painters|actors).*?([P|K]a\w+).*$/gm

DEMO

PS:我打算$k作为一个变量,并用一个实际值替换它(在我的例子P),如果你的意思与我的答案不同,我也会修改regex。你知道吗

警告

除非按长度(从最长到最小)对管道组中的条目进行排序,否则使用regex的每个解决方案都不会按预期工作。 在python中,使用如下内容:

persons.sort(lambda x,y: cmp(len(y), len(x)))

为什么?像这样的匹配组(Maria|Joe|Jack|Mariano)永远不会匹配字符串Mariano,因为它将首先匹配Maria,然后停止搜索,就像任何公共编程语言中的任何或组一样。你知道吗

这个regex抓住了你的例子。你知道吗

(\w+) was one of the few (painters|actors) in ([$k|K]a\w+) whom everybody admired.

编辑添加了如何检查组的示例

假设你想检查这个名字是否在一个有1000个名字的列表中,正则表达式是不够的。您可以捕获这个正则表达式的结果,并添加额外的检查。你知道吗

import re

input_strs = ['Jack was one of the few painters in Kansan whom everybody admired.',
              'Michael was one of the few actors in Karlsruhe whom everybody admired.']

allowed_names = ['Michael', 'John']

pattern = re.compile(r'(\w+) was one of the few (painters|actors) in ([$k|K]a\w+) whom everybody admired.')

for input in input_strs:
    m = pattern.match(input)
    if m:
        # check if name is in the list
        name = m.group(1)
        print('name: ' + name)
        if name in allowed_names:
            print('ok')
        else:
            print('fail')

给你:

>>> import re
>>> persons = ['Michael', 'Jack', 'Joe', 'Maria', 'Susan']
>>> professions = ['painters', 'actors']
>>> regex = re.compile(r'{person} was one of the few {profession} in {city} whom everybody admired\.'
                         .format(person='|'.join(persons),
                         profession='|'.join(professions),
                         city='[$k|K]a\w+'))

>>> a = ['Jack was one of the few painters in Kansan whom everybody admired.', 
         'Michael was one of the few actors in Karlsruhe whom everybody admired.',   
         'Jone was one of the few painters in Kansan whom everybody admired.', 
         'Susan was one of the few foo in Kansan whom everybody admired.', 
         'Joe was one of the few actors in Kansan whom everybody admired.']


>>> for i in a:
...     regex.search(i)
...     
... 
<_sre.SRE_Match object; span=(0, 4), match='Jack'>
<_sre.SRE_Match object; span=(0, 7), match='Michael'>
<_sre.SRE_Match object; span=(0, 3), match='Joe'>

相关问题 更多 >

    热门问题