Python正则表达式：删除nonASCII字符和以numb结尾的单词

#remove all these words: re.sub(r'\W|\b[^a-z]*[^a-z]\b',' ', "1 123 - hey2 a2 1a3 ".lower()) >>>' hey2 a2 1a3 ' #keep all these words: re.sub(r'\W|\b[^a-z]*[^a-z]\b',' ', "1st first a2a 2bb esta' ".lower()) >>>'1st first a2a 2bb esta ' #This works

2条回答

网友

1楼 · 编辑于 2024-09-30 07:36:25

Remove non-unicode characters and words ending in number

似乎要删除任何非单词字符（与\W模式匹配）和任何以数字结尾的“单词”（字母/数字序列_，\w模式）。在

所以，你可以用

re.sub(r'\W|\b\w*\d\b', ' ', s)

请注意，如果您在python2.x中处理Unicode字符串，则需要传递re.UNICODE标志以使\W和{}识别Unicode。在

图案细节

\W-非单词字符（不是字母、数字或_的任何字符）
|-或
\b-前导词边界
\w*-零个或多个（*）字字符
\d-一个数字
\b-一个尾随的单词边界。

请注意，如果要将_字符视为非单词字符，请将\W替换为[\W_]，并将{}替换为{}。在

网友

2楼 · 编辑于 2024-09-30 07:36:25

你少了一个（点）。在“*”之前。在

“.*”表示0个或更多个字符。在

  re.sub(r'\W|\b[^a-z]*[^a-z]\b',' ', "1 123 - hey2 a2 1a3 ".lower()) 

In [6]: re.sub(r'\W|\b[^a-z].*[^a-z]\b',' ', "1 123 - hey2 a2 1a3 asdasd".lower())
Out[6]: ' asdasd'

相关问题更多 >

编程相关推荐

热门问题

热门文章