Python，转换4byte char以避免MySQL错误“字符串值不正确：”

>>> import re >>> highpoints = re.compile(u'[\U00010000-\U0010ffff]') >>> example = u'Some example text with a sleepy face: \U0001f62a' >>> highpoints.sub(u'', example) u'Some example text with a sleepy face: '

1条回答

网友

1楼 · 发布于 2024-10-01 17:28:01

在UCS-2构建中，python在内部为\U0000ffff代码点上的每个unicode字符使用2个代码单元。正则表达式需要使用这些表达式，因此需要使用以下正则表达式来匹配它们：

highpoints = re.compile(u'[\uD800-\uDBFF][\uDC00-\uDFFF]')

这个正则表达式匹配用UTF-16代理项对编码的任何代码点（参见UTF-16 Code points U+10000 to U+10FFFF）。在

要使其在Python UCS-2和UCS-4版本之间兼容，可以使用try:/except来使用其中一个：

^{pr2}$

UCS-2 python构建演示：

>>> import re
>>> highpoints = re.compile(u'[\uD800-\uDBFF][\uDC00-\uDFFF]')
>>> example = u'Some example text with a sleepy face: \U0001f62a'
>>> highpoints.sub(u'', example)
u'Some example text with a sleepy face: '

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python，转换4byte char以避免MySQL错误“字符串值不正确：”

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >