Python CSV到JSON解析器将引号添加到outpu

2024-09-28 21:25:52 发布

您现在位置:Python中文网/ 问答频道 /正文

感谢用户Petri,我有了一个CSV到JSON的Python脚本,让我将geonamescsv转储转换为MongoImport友好的JSON。在

问题是Geonames有一个名为alternatenames的字段,该字段当前被引用并作为一个长字符串处理。因此在MongoDB中无法正确查询。我想将字段更改为字符串数组,例如:"alternatenames":["name1", "name2"]

Python脚本如下所示:

import csv, simplejson, decimal, codecs

data = open("cities.txt")
reader = csv.DictReader(data, delimiter=",", quotechar='"')

with codecs.open("cities.json", "w", encoding="utf-8") as out:
   for r in reader:
      for k, v in r.items():
         # make sure nulls are generated
         if not v:
            r[k] = None
         # parse and generate decimal arrays
         elif k == "loc":
            r[k] = [decimal.Decimal(n) for n in v.strip("[]").split(",")]
         # generate a number
         elif k == "geonameid":
            r[k] = int(v)
      out.write(simplejson.dumps(r, ensure_ascii=False, use_decimal=True)+"\n")

我的CSV包含以下字段:

^{pr2}$

我当前的JSON输出如下所示:

{"loc": [48.91667, 32.48333], "name": "Zamīn Sūkhteh", "geonameid": 3, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": "Zamin Sukhteh,Zamīn Sūkhteh", "asciiname": "Zamin Sukhteh", "admin4_code": null}
{"loc": [48.9, 32.5], "name": "Yekāhī", "geonameid": 5, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": "Yekahi,Yekāhī", "asciiname": "Yekahi", "admin4_code": null}
{"loc": [48.2, 32.1], "name": "Tarvīḩ ‘Adāī", "geonameid": 7, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": "Tarvih `Adai,Tarvīḩ ‘Adāī", "asciiname": "Tarvih `Adai", "admin4_code": null}

我想更改JSON输出以添加一个字符串数组,如下所示(向右滚动到alternatenames):

{"loc": [48.91667, 32.48333], "name": "Zamīn Sūkhteh", "geonameid": 3, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": ["Zamin Sukhteh", "Zamīn Sūkhteh"], "asciiname": "Zamin Sukhteh", "admin4_code": null}
{"loc": [48.9, 32.5], "name": "Yekāhī", "geonameid": 5, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": ["Yekahi,Yekāhī"], "asciiname": "Yekahi", "admin4_code": null}
{"loc": [48.2, 32.1], "name": "Tarvīḩ ‘Adāī", "geonameid": 7, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": ["Tarvih `Adai", "Tarvīḩ ‘Adāī"], "asciiname": "Tarvih `Adai", "admin4_code": null}

另外,我是否应该将access2010导出的CSV中的quotechar改为^,而不是{},以避免重复引用?在

谢谢你的帮助。在


Tags: nameircodepplcountrynulllocfeature
3条回答

在现有的“elif”中添加另一个“elif”来处理“alternateName”:

     elif k == "alternatenames":
        r[k] = [name.strip() for name in v.split(",")]

因此,首先在逗号上拆分字符串,然后去掉开头/结尾处的空白。在

尝试包括以下内容:

elif k == "alternatenames":
   r[k] = [v.split(",")]

我不认为你的引言是这里的问题。您必须手动指定要将该字段转换为字符串列表。在

警告:下面是未测试的代码

elif k == "alternatenames":
    r[k] = unicode.split(v, ',')

我假设v是基于字符的unicode,但是如果是ascii,请调整。在

相关问题 更多 >