在Python中,匹配regex开头或结尾的模式时出现问题

2024-10-02 16:34:35 发布

您现在位置:Python中文网/ 问答频道 /正文

我很难用Python正则表达式。我想罚款任何N,S,E,W,NB,SB,EB,WB,包括在字符串的开始或结束。我的正则表达式很容易在中间找到它,但在开始或结束时都失败了。你知道吗

有人能告诉我下面的代码示例中dirPattern I有什么问题吗?你知道吗

注意:我意识到我还有一些其他的问题要处理(例如“W of”),但是我想我知道如何为这些问题修改regex。你知道吗

提前谢谢。你知道吗

import re

nameList = ['Boulder Highway and US 95 NB',  'Boulder Hwy and US 95 SB', 
'Buffalo and Summerlin N', 'Charleston and I-215 W', 'Eastern and I-215 S', 'Flamingo and NB I-15',
'S Buffalo and Summerlin', 'Flamingo and SB I-15', 'Gibson and I-215 EB', 'I-15 at 3.5 miles N of Jean',
'I-15 NB S I-215 (dual)', 'I-15 SB 4.3 mile N of Primm', 'I-15 SB S of Russell', 'I-515 SB at Eastern W',
'I-580 at I-80 N E', 'I-580 at I-80 S W', 'I-80 at E 4TH St Kietzke Ln', 'I-80 East of W McCarran',
'LV Blvd at I-215 S', 'S Buffalo and I-215 W', 'S Decatur and I-215 WB', 'Sahara and I-15 East',
'Sands and Wynn South Gate', 'Silverado Ranch and I-15 (west side)']

dirMap = {'N': 'North', 'S': 'South', 'E': 'East', 'W': 'West'}

dirPattern = re.compile(r'[ ^]([NSEW])B?[ $]')

print('name\tmatch\tdirSting\tdirection')
for name in nameList:
    match = dirPattern.search(name)
    direction = None
    dirString = None
    if match:
        dirString = match.group(1)
        if dirString in dirMap:
            direction = dirMap[dirString]
    print('%s\t%s\t%s\t%s'%(name, match, dirString, direction))

一些示例预期输出:

name match dirSting direction

Boulder Highway and US 95 NB <_sre.SRE_Match object at 0x7f68af836648> N North

Boulder Hwy and US 95 SB <_sre.SRE_Match object at 0x7f68ae836648> S South

Buffalo and Summerlin N <_sre.SRE_Match object at 0x7f68af826648> N North

Charleston and I-215 W <_sre.SRE_Match object at 0x7f68cf836648> W West

Flamingo and NB I-15 <_sre.SRE_Match object at 0x7f68af8365d0> N North

S Buffalo and Summerlin <_sre.SRE_Match object at 0x7f68aff36648> S South

Gibson and I-215 EB <_sre.SRE_Match object at 0x7f68afa36648> E East

但是,开始或结束示例给出:

Boulder Highway and US 95 NB None None None


Tags: andofnamenoneobjectmatchatsb
2条回答

这段代码中经过修改的正则表达式就可以实现这个目的。这包括处理诸如“W of”、“at E”和类似的事情:

import re

nameList = ['Boulder Highway and US 95 NB',  'Boulder Hwy and US 95 SB', 
'Buffalo and Summerlin N', 'Charleston and I-215 W', 'Eastern and I-215 S', 'Flamingo and NB I-15',
'S Buffalo and Summerlin', 'Flamingo and SB I-15', 'Gibson and I-215 EB', 'I-15 at 3.5 miles N of Jean',
'I-15 NB S I-215 (dual)', 'I-15 SB 4.3 mile N of Primm', 'I-15 SB S of Russell', 'I-515 SB at Eastern W',
'I-580 at I-80 N E', 'I-580 at I-80 S W', 'I-80 at E 4TH St Kietzke Ln', 'I-80 East of W McCarran',
'LV Blvd at I-215 S', 'S Buffalo and I-215 W', 'S Decatur and I-215 WB', 'Sahara and I-15 East',
'Sands and Wynn South Gate', 'Silverado Ranch and I-15 (west side)']

dirMap = {'N': 'North', 'S': 'South', 'E': 'East', 'W': 'West'}

dirPattern = re.compile(r'(?:^| )(?<! at )(?<! of )([NSEW])B?(?! of )(?: |$)')

print('name\tdirSting\tdirection')
for name in nameList:
    match = dirPattern.search(name)
    direction = None
    dirString = None
    if match:
        dirString = match.group(1)
        direction = dirMap.get(dirString)
    print('> %s\t\t%s\t%s'%(name, dirString, direction))

正则表达式可以理解为:

(?:^| )以字符串开头或空格开头

(?<! at )前面没有“at”

(?<! of )前面没有'of'

([NSEW])“N”、“S”、“E”、“W”中的任意一个(这将在匹配.组(1) ()

B?可选地后跟“B”(如在绑定中)

(?! of )后面不跟“at”

(?: |$)以字符串或空格结尾

最终输出为:

Boulder Highway and US 95 NB N North

Boulder Hwy and US 95 SB S South

Buffalo and Summerlin N N North

Charleston and I-215 W W West

Eastern and I-215 S S South

Flamingo and NB I-15 N North

S Buffalo and Summerlin S South

Flamingo and SB I-15 S South

Gibson and I-215 EB E East

I-15 at 3.5 miles N of Jean None None

I-15 NB S I-215 (dual) N North

I-15 SB 4.3 mile N of Primm S South

I-15 SB S of Russell S South

I-515 SB at Eastern W S South

I-580 at I-80 N E N North

I-580 at I-80 S W S South

I-80 at E 4TH St Kietzke Ln None None

I-80 East of W McCarran None None

LV Blvd at I-215 S S South

S Buffalo and I-215 W S South

S Decatur and I-215 WB S South

Sahara and I-15 East None None

Sands and Wynn South Gate None None

Silverado Ranch and I-15 (west side) None None

旁注:我决定我不想要结束字符串的情况。为此,正则表达式应该是:

dirPattern = re.compile(r'(?:^| )(?<! at )(?<! of )([NSEW])B? (?!of )')

你需要使用lookarounds。你知道吗

dirPattern = re.compile(r'(?<!\S)([NSEW])B?(?!\S)')

[ ^]将匹配空格或插入符号。(?<!\S)负lookback断言匹配的前面将有任何bot,而不是非空格字符。(?!\S)断言match后面不能跟非空格字符。你知道吗

为什么我使用了否定的lookahead而不是肯定的方法,python的默认re模块将不支持(?<=^| )。你知道吗

相关问题 更多 >