简单、更安全的字符串操作Python

checkstr = 'http://www.trulia.com/profile/agent-name-agent-orlando-fl-24408364/' state = '' citystrs = re.findall('-agent-(.*)-\d', checkstr)[0:1] print citystrs for citystr in citystrs: if '-' in citystr: if len(citystr.split('-')[-1]) == 2: state = citystr.split('-')[-1].upper().strip() city = string.replace(citystr.upper(), state, '') city = string.replace(city, '-', ' ').title().strip() else: city = string.replace(citystr, '-', ' ').title().strip() else: city = citystr.title().strip() print city, state

3条回答

网友

1楼 · 编辑于 2024-10-02 16:32:28

当您只寻找第一个匹配时，使用re.search而不是findall会更清楚。你知道吗
如果可能有多个匹配项（正如使用[0:1]所建议的那样），请注意.*是贪婪的。例如，从字符串-agent-orlando-fl-24408364-agent-orlando-fl-24408364中，regex捕获orlando-fl-24408364-agent-orlando-fl。改用.*?。你知道吗
rpartitionstring方法在最后一次出现分隔符时进行拆分，并始终返回三个字符串，这使得处理角点情况更加容易。你知道吗

建议代码：

m = re.search('-agent-(.*?)-\d', checkstr)
if m:
    citystr = m.group(1)
    city, _, state = citystr.rpartition('-')
    if len(state) <> 2:
        city = citystr
        state = ''
    city = city.replace('-', ' ').title()
    state = state.upper()

网友

2楼 · 编辑于 2024-10-02 16:32:28

为什么不用切片呢？你知道吗

if '-' in citystr:
    sep_index = citystr.find('-')
    city = citystr[0:sep_index].title()
    state = citystr[sep_index+1:].upper()
else:
    city = citystr.title()

使用timeit（数字=10000）：

yours : 3.56353430347
mine :  1.04823075931

网友

3楼 · 编辑于 2024-10-02 16:32:28

我会这样做：

import re

reg = re.compile(r'-agent-(?P<city>[^-]*)(?:-(?P<state>[^-]*))?-\d')    

checkstr = 'http://www.trulia.com/profile/agent-name-agent-orlando-fl-24408364/'

m = reg.search(checkstr)

city = m.group('city').title()
state = m.group('state').upper() if (m.group('state')) else ''

print city, state

如果需要多次使用该模式，可以使用re.compile一次性编译它

我没有使用.*这是非常宽容的，并生成回溯，而是使用[^-]*（所有这些都不是零次或多次破折号），在第一个破折号之前停止。你知道吗

状态和前面的破折号位于可选组：(?:-(?P<state>[^-]*))?。因此，即使字符串没有状态部分，模式也会成功。你知道吗

有了这个更改re.findall就不再需要了，您可以使用re.search返回一个结果。请注意，如果您不确定字符串格式，则始终可以添加测试以检查是否存在匹配项。你知道吗

为了使代码更具可读性，我使用命名的captures(?P<name>...)。因此，通过这种方式，您可以轻松地检索组的内容：m.group('name')。但是，如果您想稍微提高速度，可以使用数字组（但这不是很重要）。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章