JSON包含错误的UTF8\u00ce\u00b2而不是Unicode\u03b2，如何在Python中修复？

2条回答

网友

1楼 · 编辑于 2024-09-21 01:11:20

下面是一个字符串修复程序，它在加载JSON之后工作。它处理任何长度的类似UTF-8的序列，并忽略看起来不像UTF-8序列的转义序列。在

示例：

import json
import re

def fix(bad):
    return re.sub(ur'[\xc2-\xf4][\x80-\xbf]+',lambda m: m.group(0).encode('latin1').decode('utf8'),bad)

# 2- and 3-byte UTF-8-like sequences and onen correct escape code.
json_text = '''\
{
  "something":"text \\u00ce\\u00b2 text \\u00e4\\u00bd\\u00a0\\u597d..."
}
'''

data = json.loads(json_text)
bad_str = data[u'something']
good_str = fix(bad_str)
print bad_str
print good_str

输出：

^{pr2}$

网友

2楼 · 编辑于 2024-09-21 01:11:20

也许是这样吧。限制为2字节UTF-8字符。在

import re

j = u'{"something":"text \\u00ce\\u00b2 text..."}'

def decodeu (match):
    u = '%c%c' % (int(match.group(1), 16), int(match.group(2), 16))
    return repr(u.decode('utf-8'))[2:8]

j = re.sub(r'\\u00([cd][0-9a-f])\\u00([89ab][0-9a-f])',decodeu, j)

print(j)

返回示例的{"something":"text \u03b2 text..."}。此时，您可以将其作为常规JSON导入并获得所需的最终字符串。在

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章

JSON包含错误的UTF8\u00ce\u00b2而不是Unicode\u03b2，如何在Python中修复？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >