回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>感谢用户Petri,我有了一个CSV到JSON的Python脚本,让我将geonamescsv转储转换为MongoImport友好的JSON。在</p>
<p>问题是Geonames有一个名为<code>alternatenames</code>的字段,该字段当前被引用并作为一个长字符串处理。因此在MongoDB中无法正确查询。我想将字段更改为字符串数组,例如:<code>"alternatenames":["name1", "name2"]</code></p>
<p>Python脚本如下所示:</p>
<pre><code>import csv, simplejson, decimal, codecs
data = open("cities.txt")
reader = csv.DictReader(data, delimiter=",", quotechar='"')
with codecs.open("cities.json", "w", encoding="utf-8") as out:
for r in reader:
for k, v in r.items():
# make sure nulls are generated
if not v:
r[k] = None
# parse and generate decimal arrays
elif k == "loc":
r[k] = [decimal.Decimal(n) for n in v.strip("[]").split(",")]
# generate a number
elif k == "geonameid":
r[k] = int(v)
out.write(simplejson.dumps(r, ensure_ascii=False, use_decimal=True)+"\n")
</code></pre>
<p>我的CSV包含以下字段:</p>
^{pr2}$
<p>我当前的JSON输出如下所示:</p>
<pre><code>{"loc": [48.91667, 32.48333], "name": "Zamīn Sūkhteh", "geonameid": 3, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": "Zamin Sukhteh,Zamīn Sūkhteh", "asciiname": "Zamin Sukhteh", "admin4_code": null}
{"loc": [48.9, 32.5], "name": "Yekāhī", "geonameid": 5, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": "Yekahi,Yekāhī", "asciiname": "Yekahi", "admin4_code": null}
{"loc": [48.2, 32.1], "name": "Tarvīḩ ‘Adāī", "geonameid": 7, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": "Tarvih `Adai,Tarvīḩ ‘Adāī", "asciiname": "Tarvih `Adai", "admin4_code": null}
</code></pre>
<p>我想更改JSON输出以添加一个字符串数组,如下所示(向右滚动到<code>alternatenames</code>):</p>
<pre><code>{"loc": [48.91667, 32.48333], "name": "Zamīn Sūkhteh", "geonameid": 3, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": ["Zamin Sukhteh", "Zamīn Sūkhteh"], "asciiname": "Zamin Sukhteh", "admin4_code": null}
{"loc": [48.9, 32.5], "name": "Yekāhī", "geonameid": 5, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": ["Yekahi,Yekāhī"], "asciiname": "Yekahi", "admin4_code": null}
{"loc": [48.2, 32.1], "name": "Tarvīḩ ‘Adāī", "geonameid": 7, "feature_class": "P", "admin3_code": null, "admin2_code": null, "cc2": null, "feature_code": "PPL", "country_code": "IR", "admin1_code": "15", "alternatenames": ["Tarvih `Adai", "Tarvīḩ ‘Adāī"], "asciiname": "Tarvih `Adai", "admin4_code": null}
</code></pre>
<p>另外,我是否应该将access2010导出的CSV中的<code>quotechar</code>改为<code>^</code>,而不是{<cd6>},以避免重复引用?在</p>
<p>谢谢你的帮助。在</p>