使用来自数据管理,例如:“夏威夷退伍军人和受益人的墓地位置,截至2011年1月”http://www.data.gov/raw/4608我试图用python解析CSV并处理每一行:
randomData = csv.DictReader(open('/downloads/ngl_hawaii.csv', 'rb'), delimiter=",")
for row in randomData:
print row
CSV数据示例:
d_first_name,d_mid_name,d_last_name,d_suffix,d_birth_date,d_death_date,section_id,row_num,site_num,cem_name,cem_addr_one,cem_addr_two,city,state,zip,cem_url,cem_phone,relationship,v_first_name,v_mid_name,v_last_name,v_suffix,branch,rank,war
Joe,"E","JoJo","","10/02/1920","03/12/2000","100-E","","3","HAWAII STATE VETERANS CEMETERY","KAMEHAMEHA HIGHWAY","","KANEOHE","HI","111444","","SXXXXX","Veteran (Self)","Joe","E","JoJo","","US ARMY","SGT","WORLD WAR II"
结果不是很漂亮(打印一行):
{'v_last_name': None, 'cem_addr_two': None, 'rank': None, 'd_suffix': None, 'city': None, 'row_num': None, 'zip': None, 'cem_phone': None, 'd_last_name': None, e, 'd_first_name': 'Joe,"E","JoJo","","10/02/1920","03/12/2000","100-E","","3","HAWAII STATE VETERANS CEMETERY","KAMEHAMEHA HIGHWAY","","KANEOHE","HI","11144 "SXXXXX","","US ARMY","SGT","WORLD WAR II"', 'war': None, 'v_mid_name': None, 'cem_url': None, 'cem_name': None, 'relationship': None, 'v_first_name': None, 'se one, 'cem_addr_one': None, 'd_birth_date': None, 'd_death_date': None}
如您所见,标题字段(csv中的第一行)没有正确地关联到每个后续行。在
我是做错了什么,还是CSV质量差?在
感谢凯西问我是否在其他程序中打开了文件。Excel弄乱了文件….
奇怪,我从你那里得到了不同的输出。在
数据.csv:
d_first_name,d_mid_name,d_last_name,d_suffix,d_birth_date,d_death_date,section_id,row_num,site_num,cem_name,cem_addr_one,cem_addr_two,city,state,zip,cem_url,cem_phone,relationship,v_first_name,v_mid_name,v_last_name,v_suffix,branch,rank,war "Emil","E","Seibel","","10/02/1920","03/12/2010","139-E","","3","HAWAII STATE VETERANS CEMETERY","KAMEHAMEHA HIGHWAY","","KANEOHE","HI","96744","","808-233-3630","Veteran (Self)","Emil","E","Seibel","","US ARMY","SGT","WORLD WAR II",
脚本:
输出:
^{pr2}$csv.DictReader
应该自动从文件中的第一行获取字段名,fieldnames
参数被省略,as described in the docs。在输出中的
None: ['']
是由每行数据上的尾随逗号引起的。在工作代码示例:
http://codepad.org/HdBhr4La
只需尝试一下,它可以很好地处理您的文件(重命名为foo)
输出=
^{pr2}$看看我下载的原始文件here,它是有效的CSV。我把你脚本的输出弄错了。在
因为你用了csv.DictReader每一行都被转换成一个字典,其中标题值作为键,每个行的数据作为值。我在同一个文件中运行它,看起来一切都是正确匹配的,尽管我没有检查整个过程。在
根据python docs
class csv.DictReader(csvfile[, fieldnames=None[, restkey=None[, restval=None[, dialect='excel'[, *args, **kwds]]]]])
如果这不是您想要的格式,您可以尝试csv.reader它只为每一行返回一个列表,而不将其与标题相关联。在
要使用上面的听写器,这可能是您想要的:
相关问题 更多 >
编程相关推荐