如何从python中的部分子字符串匹配返回完整的子字符串作为列表？

3条回答

网友

1楼 · 编辑于 2024-05-17 04:03:34

对于与whitspace边界不区分大小写的匹配，您可以使用：

(?i)(?<!\S)\w*(?:tion|ex|ph|[oia]st)\w*(?!\S)

模式匹配：

(?i)不区分大小写匹配的内联修饰符（或使用re.I）
(?<!\S)在左侧断言空白边界
\w*匹配可选的单词字符
(?:非捕获组
- tion|ex|ph|[oia]st使用字符类匹配tion{}{}或ost{}{}
)关闭非捕获组
\w*匹配可选的单词字符
(?!\S)在右边声明一个空白边界

Regex demo Python demo

def latin_ish_words(text):
    import re
    pattern = r"(?i)(?<!\S)\w*(?:tion|ex|ph|[oia]st)\w*(?!\S)"
    return re.findall(pattern, text)

print(latin_ish_words("This functions as expected"))

输出

['functions', 'expected']

网友

2楼 · 编辑于 2024-05-17 04:03:34

你可以用

pattern=re.compile(r"\w*?(?:tion|ex|ph|ost|ast|ist)\w*")
pattern=re.compile(r"[a-zA-Z]*?(?:tion|ex|ph|ost|ast|ist)[a-zA-Z]*")
pattern=re.compile(r"[^\W\d_]*?(?:tion|ex|ph|ost|ast|ist)[^\W\d_]*")

正则表达式（请参见the regex demo）匹配

\w*?-零个或更多但尽可能少的单词字符
(?:tion|ex|ph|ost|ast|ist)-字符串之一
\w*-零个或更多但尽可能多的单词字符

[a-zA-Z]部分将只匹配ASCII字母，[^\W\d_]将匹配任何Unicode字母

注意使用带有re.findall的非捕获组，否则，捕获的子字符串也将进入输出列表

如果只需要匹配字母单词，并且需要将它们作为整个单词进行匹配，请添加word boundaries，r"\b[a-zA-Z]*?(?:tion|ex|ph|ost|ast|ist)[a-zA-Z]*\b"

见Python demo：

import re
def latin_ish_words(text):
    import re
    pattern=re.compile(r"\w*?(?:tion|ex|ph|ost|ast|ist)\w*")
    return pattern.findall(text)
 
print(latin_ish_words("This functions as expected"))
# => ['functions', 'expected']

网友

3楼 · 编辑于 2024-05-17 04:03:34

忽略案例

pattern=re.compile(r"tion|ex|ph|ost|ast|ist")
matches=pattern.findall(text)

不这样做，考虑下面的例子

import re
pattern=re.compile(r"tion|ex|ph|ost|ast|ist")
text = "SCREAMING TEXT"
print(pattern.findall(text))

输出

[]

尽管应该有EX，但是应该像这样添加re.IGNORECASE标志

import re
pattern=re.compile(r"tion|ex|ph|ost|ast|ist", re.IGNORECASE)
text = "SCREAMING TEXT"
print(pattern.findall(text))

输出

['EX']

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何从python中的部分子字符串匹配返回完整的子字符串作为列表？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >