<p>在本例中,<code>decode()</code>将是对下载的JSON字符串执行的适当操作。在</p>
<p>考虑这个简化的示例,它将通过在URL中指定<code>limit=1&offset=0</code>来下载单个候选对象的JSON:</p>
<pre><code>>>> from urllib.request import urlopen
>>> url = urlopen('https://represent.opennorth.ca/candidates/house-of-commons/?limit=1&offset=0')
>>> content = url.read()
>>> print(type(content))
<class 'bytes'>
>>> print(url.getheader('Content-Type'))
application/json; charset=utf-8
>>> content
b'{"objects": [{"first_name": "Pascale", "last_name": "D\\u00e9ry", "election_name": "House of Commons", "name": "Pascale D\\u00e9ry", "elected_office": "candidate", "url": "", "gender": "", "extra": {}, "related": {"boundary_url": "/boundaries/federal-electoral-districts-next-election/24025/", "election_url": "/elections/house-of-commons/"}, "source_url": "http://www.conservative.ca/?member=candidates", "offices": [], "party_name": "Conservative", "incumbent": null, "district_name": "Drummond", "email": "", "personal_url": "http://www.conservative.ca/team/member/?fname=Pascale&lname=D\\u00e9ry&type=candidates", "photo_url": "http://www.conservative.ca/media/team/Pascale-Dery.jpg"}], "meta": {"next": "/candidates/house-of-commons/?limit=1&offset=1", "total_count": 1129, "previous": null, "limit": 1, "offset": 0}}'
</code></pre>
<p>由此我们可以看到内容的类型是bytes,即字节字符串。字节字符串没有<code>encode()</code>方法;它们被假定已经在<em>某些</em>编码中,并且只能使用正确的编码将其解码为unicode。在本例中,数据是UTF-8编码的JSON,如<code>Content-Type</code>报头所示。在</p>
<p>你可以在这里做很多事情:</p>
<ul>
<li><p>以二进制模式打开输出文件,只需编写JSON字符串
和文件一样。因为传入的数据是UTF-8编码的
将生成一个UTF-8编码的JSON文件,这可能是
CSV转换器要求:</p>
^{2美元</li>
<li><p>使用支持的编码以文本模式打开输出文件
CSV转换器,将来自UTF-8的传入JSON字符串解码为文本
字符串(unicode),并将解码后的字符串写入文件:</p>
<pre><code>with open('output.json', 'w', encoding='iso-8859-1') as f:
f.write(content.decode('utf-8'))
</code></pre>
<p>这里我选择了iso-8859-1编码作为例子,你也可以选择ASCII。请注意,如果编码是UTF-8,那么将对数据进行解码,然后将其重新编码为UTF-8,所以您可以像第一次一样编写数据。</p></li>
<li><p>另一个选项,也是我推荐的一个,是使用JSON解码器对传入的数据进行解码,然后使用您首选的编码将其写入文件。这样做的好处是确保传入的数据实际上是JSON,并允许您在传递给CSV转换器之前发现任何错误:</p>
<pre><code>import json
with open('output.json', 'w') as f:
data = json.loads(content.decode('utf8'))
json.dump(data, f)
</code></pre></li>
</ul>
<hr/>
<p>您可能会发现使用<a href="http://docs.python-requests.org/en/latest/" rel="nofollow">^{<cd5>}</a>模块更容易。内置JSON解析和字符解码:</p>
<pre><code>>>> import requests
>>> r = requests.get('https://represent.opennorth.ca/candidates/house-of-commons/?limit=1&offset=0')
>>> type(r.text)
>>> type(r.content)
>>> data = r.json()
>>> data
{'objects': [{'first_name': 'Pascale', 'extra': {}, 'url': '', 'last_name': 'Déry', 'district_name': 'Drummond', 'incumbent': None, 'offices': [], 'gender': '', 'personal_url': 'http://www.conservative.ca/team/member/?fname=Pascale&lname=Déry&type=candidates', 'elected_office': 'candidate', 'party_name': 'Conservative', 'source_url': 'http://www.conservative.ca/?member=candidates', 'election_name': 'House of Commons', 'email': '', 'name': 'Pascale Déry', 'photo_url': 'http://www.conservative.ca/media/team/Pascale-Dery.jpg', 'related': {'boundary_url': '/boundaries/federal-electoral-districts-next-election/24025/', 'election_url': '/elections/house-of-commons/'}}], 'meta': {'total_count': 1129, 'limit': 1, 'next': '/candidates/house-of-commons/?limit=1&offset=1', 'previous': None, 'offset': 0}}
</code></pre>
<p>这里<code>r.content</code>是下载的原始内容,无论服务器发送数据的编码是什么。<code>r.text</code>相同的原始内容被解码成unicode字符串。并且<code>r.json()</code>将解析后的JSON数据作为字典提供给您。然后,将其写入文件非常简单:</p>
<pre><code>with open('output.json', 'w') as f:
json.dump(r.json(), f)
</code></pre>