python regex替换unicode

2024-10-01 09:26:51 发布

您现在位置：Python中文网/ 问答频道 /正文

10574

网友

男 | 程序猿一只，喜欢编程写python代码。

在第一个测试字符串中，我试图用空格替换文本中间的Unicode右箭头字符，但它似乎不起作用。在

一般来说，我尝试删除所有单个字符或更多unicode“非单词”，但如果单词是a-z0-9和unicode的混合体，则保留这些单词，或者只删除\w

# -*- coding: utf-8 -*-
import re
str = 'hi… » Test'
str = 're of… » Pr'
str = 're of… » Pr | removepipeaswell'
print str
str = re.sub(r' [^a-z0-9]+ ', ' ', str , re.UNICODE|re.MULTILINE)
# str = re.sub(r' [^\p{Alpha}] ', ' ', str, re.UNICODE)
print str
're of… Pr removepipeaswell' #expected output

str_nbsp = 'afds » asf'

编辑：添加了另一个测试字符串，我不想删除“的…”（unicode点），我想删除多个unicode（非word）字符。在

编辑：在测试用例中使用这个功能（但不是在完整的html中？？？-它只显示替换字符串前半部分的匹配项，然后忽略其余部分。）

^{pr2}$

编辑：fml，一定是一些愚蠢的事情，比如没有正确地读取参数列表：http://bytes.com/topic/python/answers/689341-sub-does-not-replace-all-occurences

[不管是谁删除了他们的回复-谢谢你的帮助。]

str = re.sub(r' [^a-z0-9]+ ', ' ', str)

最后一个测试字符串“str峎nbsp”与上面的正则表达式不匹配。其中一个空格字符实际上是一个不间断的空格字符。我用过www.regexr.com网站然后在每个字符上悬停以找出答案。在

Tags： of 字符串 re 编辑 unicode pr 字符单词

1条回答

网友

1楼 · 发布于 2024-10-01 09:26:51

str = re.sub(r' [^a-z0-9]+ ', ' ', str)

python regex替换unicode

相关问题更多 >

编程相关推荐

热门问题

热门文章

python regex替换unicode

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >