使用Regex和Python的Unicode替换

2024-09-30 05:20:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个字符串如下:

str1 = "heylisten\uff08there is something\uff09to say \uffa9"

我需要用两边的空格替换正则表达式检测到的unicode值。你知道吗

所需输出字符串:

out = "heylisten \uff08 there is something \uff09 to say  \uffa9 "

我用了一个关于芬德尔把所有的火柴都拿出来然后换掉。它看起来像:

p1 = re.findall(r'\uff[0-9a-e][0-9]', str1, flags = re.U)  
out = str1
for item in p1:
    print item
    print out
    out= re.sub(item, r" " + item + r" ", out) 

输出:

'heylisten\\ uff08 there is something\\ uff09 to say \\ uffa9 ' 

上面的方法打印一个额外的“\”并将其与uff分开,这有什么错?我甚至试过用re.search但是它似乎只分开\uff08。有更好的办法吗?你知道吗


Tags: to字符串reisoutitemsomethingsay
2条回答

I have a string as follows:

str1 = "heylisten\uff08there is something\uff09to say \uffa9"

I need to replace the unicode values ...

您没有任何unicode值。你有一个备用环。你知道吗

str1 = u"heylisten\uff08there is something\uff09to say \uffa9"
 ...
p1 = re.sub(ur'([\uff00-\uffe9])', r' \1 ', str1)
print re.sub(r"(\\uff[0-9a-e][0-9])", r" \1 ", x)

您可以直接使用这个re.sub。请参见演示。你知道吗

http://regex101.com/r/sU3fA2/67

import re
p = re.compile(ur'(\\uff[0-9a-e][0-9])', re.UNICODE)
test_str = u"heylisten\uff08there is something\uff09to say \uffa9"
subst = u" \1 "

result = re.sub(p, subst, test_str)

输出:

heylisten \uff08 there is something \uff09 to say  \uffa9

相关问题 更多 >

    热门问题