无法在python中使用regex替换字符串中的\xe2\x80\xa6\n

2024-06-02 00:57:56 发布

您现在位置:Python中文网/ 问答频道 /正文

我有下面的绳子:

data = "pizza won't divorce you pizza won't betray you pizza won't cheat on you pizza won't fight with you  why don't people just \xe2\x80\xa6\n"

我想从中找到所有\[a-z][a-z][0-9]\(\xe2\x80\xa6\在data字符串末尾给出)表达式,以便替换它们。我尝试了以下代码:

re.findall(r"\\[a-z][a-z][0-9]\\+", data)

但它产生了一个空列表。请帮忙。在


Tags: youdataonwithxe2cheatwon绳子
3条回答

如果您想要的话,您必须将字符串定义为raw string,因为python将尝试转换unicode。在

data = r"pizza won't divorce you pizza won't betray you pizza won't cheat on you pizza won't fight with you  why don't people just \xe2\x80\xa6\n"

print re.findall(r"\\[a-z][a-z]?[0-9]+", data)

输出:['\\xe2', '\\x80', '\\xa6']

另一种解决方案:

print re.findall(r"\\[a-z]{1,2}\d{1,2}", data)

要处理文本,应该使用Unicode字符串:b"\xe2\x80\xa6"bytestring是utf-8编码的^{} (U+2026 HORIZONTAL ELLIPSIS)

text = u"pizza won't divorce\u2026"

要替换它:

^{pr2}$

相关问题 更多 >