文件包含\u00c2\u00a0，转换为字符问题的回答

文件包含\u00c2\u00a0，转换为字符

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我有一个JSON文件，其中包含这样的文本 <pre><code> .....wax, and voila!\u00c2\u00a0At the moment you can't use our ... </code></pre> 我的简单问题是如何将这些\u代码转换（而不是删除）为空格、撇号和e.t.c。。。？在 输入：带有<code>.....wax, and voila!\u00c2\u00a0At the moment you can't use our ...</code>的文本文件 输出：<code>.....wax, and voila!(converted to the line break)At the moment you can't use our ...</code> Python代码 ^{pr2}$ 我所做的： <ul> <li>使用.json（）</li> <li>任何组合.encode（）.decode（）和e.t.c的不同方法</li> </ul> 编辑1 当我把这个文件上传到BigQuery时，我有-<code>Â</code>符号 大样本： <pre><code>{ "xxxx1": "...You don\u2019t nee...", "xxxx2": "...Gu\u00e9rer...", "xxxx3": "...boost.\u00a0Sit back an....", "xxxx4": "\" \u306f\u3058\u3081\u307e\u3057\u3066\"", "xxxx5": "\u00a0\n\u00a0", "xxxx6": "It was Christmas Eve babe\u2026", "xxxx7": "It\u2019s xxx xxx\u2026" } </code></pre> Python代码： <pre><code>import json import re import codecs def load(): epos_export = r'{"xxxx1": "...You don\u2019t nee...","xxxx2": "...Gu\u00e9rer...","xxxx3": "...boost.\u00a0Sit back an....","xxxx4": "\" \u306f\u3058\u3081\u307e\u3057\u3066\"","xxxx5": "\u00a0\n\u00a0","xxxx6": "It was Christmas Eve babe\u2026","xxxx7": "It\u2019s xxx xxx\u2026"}' x = json.loads(re.sub(r"(?i)(?:\\u00[0-9a-f]{2})+", unmangle_utf8, epos_export)) with open("TEST.json", "w") as file: json.dump(x,file) def unmangle_utf8(match): escaped = match.group(0) # '\\u00e2\\u0082\\u00ac' hexstr = escaped.replace(r'\u00', '') # 'e282ac' buffer = codecs.decode(hexstr, "hex") # b'\xe2\x82\xac' try: return buffer.decode('utf8') # '€' except UnicodeDecodeError: print("Could not decode buffer: %s" % buffer) if __name__ == '__main__': load() </code></pre>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

文件包含\u00c2\u00a0，转换为字符

1 个回答

相关Python问题