从网站读取数据并尝试获取重音字符

2条回答

网友

1楼 · 编辑于 2024-10-04 05:20:04

如果您的真正目标是将数据存储在电子表格中，请尝试以下程序：

import requests
import csv
import json

base_url="https://represent.opennorth.ca/candidates/house-of-commons/?limit=250&offset={}"
#base_filename=r"F:\electoral_map\candidates_python\candidates{0}_to_{1}.csv"
base_filename=r"candidates{0}_to_{1}.csv"
keys = [
    'name',
    'first_name',
    'last_name',
    'election_name',
    'elected_office',
    'district_name',
    'email',
    'incumbent',
    'party_name',
    'personal_url',
    'photo_url',
    'source_url',
    'url',
]

for i in range(0, 2000, 250):
    url = base_url.format(i)
    filename=base_filename.format(i, i+250)
    data = requests.get(url)
    data = data.text
    data = json.loads(data)
    data = data['objects']

    with open(filename, 'wt', encoding='utf-8') as f:
        w = csv.DictWriter(f, keys, extrasaction='ignore')
        w.writeheader()
        w.writerows(data)

注意：这个程序需要Python3。如果你用的是Python2，告诉我，我会给你一个在那里工作的版本。在

网友

2楼 · 编辑于 2024-10-04 05:20:04

在本例中，decode()将是对下载的JSON字符串执行的适当操作。在

考虑这个简化的示例，它将通过在URL中指定limit=1&offset=0来下载单个候选对象的JSON：

>>> from urllib.request import urlopen
>>> url = urlopen('https://represent.opennorth.ca/candidates/house-of-commons/?limit=1&offset=0')
>>> content = url.read()
>>> print(type(content))
<class 'bytes'>
>>> print(url.getheader('Content-Type'))
application/json; charset=utf-8
>>> content
b'{"objects": [{"first_name": "Pascale", "last_name": "D\\u00e9ry", "election_name": "House of Commons", "name": "Pascale D\\u00e9ry", "elected_office": "candidate", "url": "", "gender": "", "extra": {}, "related": {"boundary_url": "/boundaries/federal-electoral-districts-next-election/24025/", "election_url": "/elections/house-of-commons/"}, "source_url": "http://www.conservative.ca/?member=candidates", "offices": [], "party_name": "Conservative", "incumbent": null, "district_name": "Drummond", "email": "", "personal_url": "http://www.conservative.ca/team/member/?fname=Pascale&lname=D\\u00e9ry&type=candidates", "photo_url": "http://www.conservative.ca/media/team/Pascale-Dery.jpg"}], "meta": {"next": "/candidates/house-of-commons/?limit=1&offset=1", "total_count": 1129, "previous": null, "limit": 1, "offset": 0}}'

由此我们可以看到内容的类型是bytes，即字节字符串。字节字符串没有encode()方法；它们被假定已经在某些编码中，并且只能使用正确的编码将其解码为unicode。在本例中，数据是UTF-8编码的JSON，如Content-Type报头所示。在

你可以在这里做很多事情：

以二进制模式打开输出文件，只需编写JSON字符串和文件一样。因为传入的数据是UTF-8编码的将生成一个UTF-8编码的JSON文件，这可能是 CSV转换器要求：
^{2美元
使用支持的编码以文本模式打开输出文件 CSV转换器，将来自UTF-8的传入JSON字符串解码为文本字符串（unicode），并将解码后的字符串写入文件：
```
with open('output.json', 'w', encoding='iso-8859-1') as f:
    f.write(content.decode('utf-8'))
```
这里我选择了iso-8859-1编码作为例子，你也可以选择ASCII。请注意，如果编码是UTF-8，那么将对数据进行解码，然后将其重新编码为UTF-8，所以您可以像第一次一样编写数据。
另一个选项，也是我推荐的一个，是使用JSON解码器对传入的数据进行解码，然后使用您首选的编码将其写入文件。这样做的好处是确保传入的数据实际上是JSON，并允许您在传递给CSV转换器之前发现任何错误：
```
import json
with open('output.json', 'w') as f:
    data = json.loads(content.decode('utf8'))
    json.dump(data, f)
```

您可能会发现使用^{}模块更容易。内置JSON解析和字符解码：

>>> import requests
>>> r = requests.get('https://represent.opennorth.ca/candidates/house-of-commons/?limit=1&offset=0')
>>> type(r.text)
>>> type(r.content)
>>> data = r.json()
>>> data
{'objects': [{'first_name': 'Pascale', 'extra': {}, 'url': '', 'last_name': 'Déry', 'district_name': 'Drummond', 'incumbent': None, 'offices': [], 'gender': '', 'personal_url': 'http://www.conservative.ca/team/member/?fname=Pascale&lname=Déry&type=candidates', 'elected_office': 'candidate', 'party_name': 'Conservative', 'source_url': 'http://www.conservative.ca/?member=candidates', 'election_name': 'House of Commons', 'email': '', 'name': 'Pascale Déry', 'photo_url': 'http://www.conservative.ca/media/team/Pascale-Dery.jpg', 'related': {'boundary_url': '/boundaries/federal-electoral-districts-next-election/24025/', 'election_url': '/elections/house-of-commons/'}}], 'meta': {'total_count': 1129, 'limit': 1, 'next': '/candidates/house-of-commons/?limit=1&offset=1', 'previous': None, 'offset': 0}}

这里r.content是下载的原始内容，无论服务器发送数据的编码是什么。r.text相同的原始内容被解码成unicode字符串。并且r.json()将解析后的JSON数据作为字典提供给您。然后，将其写入文件非常简单：

with open('output.json', 'w') as f:
    json.dump(r.json(), f)

相关问题更多 >

编程相关推荐

热门问题

热门文章