对原始字符串进行编码，以便将其解码为json

scraped = '{"propertyNotes": [{"title": "Local Description", "text": "\u003Cp\u003EAPPS\u003C/p\u003E\n\n\u003Cp\u003EBig Island Revealed (comes as app or as a printed book)\u003C/p\u003E\n\n\u003Cp\u003EAloha Big Island\u003C/p\u003E\n\n\u003Cp\u003EBig Island\u003C/p\u003E\n\n\u003Cp\u003EBig Island Smart Maps (I like this one a lot)\u003C/p\u003E\n\n\u003Cp\u003EBig Island Adventures (includes videos)\u003C/p\u003E\n\n\u003Cp\u003EThe descriptions of beaches are helpful. Suitability for swimming, ease of access, etc. is included. Some beaches are great for picnics and scenic views, while others are suitable for swimming and snorkeling. Check before you go.\u003C/p\u003E"}]}' scraped_raw = r'{"propertyNotes": [{"title": "Local Description", "text": "\u003Cp\u003EAPPS\u003C/p\u003E\n\n\u003Cp\u003EBig Island Revealed (comes as app or as a printed book)\u003C/p\u003E\n\n\u003Cp\u003EAloha Big Island\u003C/p\u003E\n\n\u003Cp\u003EBig Island\u003C/p\u003E\n\n\u003Cp\u003EBig Island Smart Maps (I like this one a lot)\u003C/p\u003E\n\n\u003Cp\u003EBig Island Adventures (includes videos)\u003C/p\u003E\n\n\u003Cp\u003EThe descriptions of beaches are helpful. Suitability for swimming, ease of access, etc. is included. Some beaches are great for picnics and scenic views, while others are suitable for swimming and snorkeling. Check before you go.\u003C/p\u003E"}]}' data = json.loads(scraped_raw) #<= works print(data["propertyNotes"]) failed = json.loads(scraped) #no work print(failed["propertyNotes"])

3条回答

网友

1楼 · 编辑于 2024-09-29 23:22:35

好吧。因此，由于我在windows上，我必须设置控制台来处理特殊字符。我通过在终端中输入chcp 65001来实现这一点。我还使用了一个正则表达式，并链接了字符串操作函数，这是python的方式。在

usable_json = json.loads(re.search('start_sub_string(.*)end_sub_string', hxs.xpath("//script[contains(., 'some_string')]//text()").extract_first()).group(1))

然后一切都停止了。我将整理编码和转义时写入数据库的行。在

网友

2楼 · 编辑于 2024-09-29 23:22:35

如果您使用的是python3.6或更高版本，我想您可以使用它

 json.loads(scraped.encode('unicode_escape'))

根据docs，这将给您一个

Encoding suitable as the contents of a Unicode literal in ASCII-encoded Python source code, except that quotes are not escaped. Decodes from Latin-1 source code. Beware that Python source code actually uses UTF-8 by default.

这似乎正是你所需要的。在

网友

3楼 · 编辑于 2024-09-29 23:22:35

问题的存在是因为你得到的字符串有转义的控制字符，当被python解释时，这些字符在编码时变成了实际的字节（虽然这不一定是坏的，但我们知道这些转义字符是json不希望看到的控制字符）。与Turn的答案类似，您需要在不解释转义值的情况下解释字符串

json.loads(scraped.encode('unicode_escape'))

这是通过按照拉丁语1编码对内容进行编码，同时将任何类似\u003的转义字符解释为字面上的\u003，除非它是某种控制字符。在

但是，如果我的理解是正确的，那么您可能不希望这样做，因为这样会丢失转义的控制字符，因此数据可能与原始数据不同。在

您可以通过注意控制字符在将编码字符串转换回正常的python字符串后消失而看到这一点：

scraped.encode('unicode_escape').decode('utf-8')

如果你想保留控制字符，你必须在加载前尝试转义字符串。在

相关问题更多 >

编程相关推荐

热门问题

热门文章