python上的简单正则表达式

2条回答

网友

1楼 · 编辑于 2024-09-26 22:09:05

all_comments是typestr还是{}？如果它是unicode类型并且字符打印正确，那么正则表达式应该可以工作。在

如果字符串是str类型，则需要使用正确的编码对其进行编码。假设您的编码是UTF-8，这将起作用：

filtered_comments = re.sub("[^\x30-\xFF]", " ", all_comments.decode('utf-8'))

另一件要注意的事情是：您^\x30-\xFF匹配!和{}以及{}以下的许多其他符号。也许你想要^\x20-\xFF，因为\x20是空间，它几乎是最低的典型ASCII字符？在

网友

2楼 · 编辑于 2024-09-26 22:09:05

尝试下面的脚本，请在第一行看到#coding=utf-8。有关详细信息，请参见PEP-0263

# coding=utf-8
import re

comments = u"Odio ¿Mañana pensar porque RT luego pasa lo que pasa Marzo ♡♡♡"

rx = re.compile(u"[\u2661]+")

# If you want to remove non-ASCII characters, as you mentioned in comments,
# uncomment following regex. 
# Downside is it will remove all accented characters too.
#
# rx = re.compile(u"[^\x00-\x7F]+")

filtered_comments = re.sub(rx, " ", comments)

print filtered_comments

它会打印出来的

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章

python上的简单正则表达式

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >