pythoncsv.dictreader不与数据管理反恐精英

2024-06-15 09:59:52 发布

您现在位置:Python中文网/ 问答频道 /正文

使用来自数据管理,例如:“夏威夷退伍军人和受益人的墓地位置,截至2011年1月”http://www.data.gov/raw/4608我试图用python解析CSV并处理每一行:

randomData = csv.DictReader(open('/downloads/ngl_hawaii.csv', 'rb'), delimiter=",")
     for row in randomData:
         print row

CSV数据示例:

d_first_name,d_mid_name,d_last_name,d_suffix,d_birth_date,d_death_date,section_id,row_num,site_num,cem_name,cem_addr_one,cem_addr_two,city,state,zip,cem_url,cem_phone,relationship,v_first_name,v_mid_name,v_last_name,v_suffix,branch,rank,war

Joe,"E","JoJo","","10/02/1920","03/12/2000","100-E","","3","HAWAII STATE VETERANS CEMETERY","KAMEHAMEHA HIGHWAY","","KANEOHE","HI","111444","","SXXXXX","Veteran (Self)","Joe","E","JoJo","","US ARMY","SGT","WORLD WAR II"

结果不是很漂亮(打印一行):

{'v_last_name': None, 'cem_addr_two': None, 'rank': None, 'd_suffix': None, 'city': None, 'row_num': None, 'zip': None, 'cem_phone': None, 'd_last_name': None, e, 'd_first_name': 'Joe,"E","JoJo","","10/02/1920","03/12/2000","100-E","","3","HAWAII STATE VETERANS CEMETERY","KAMEHAMEHA HIGHWAY","","KANEOHE","HI","11144 "SXXXXX","","US ARMY","SGT","WORLD WAR II"', 'war': None, 'v_mid_name': None, 'cem_url': None, 'cem_name': None, 'relationship': None, 'v_first_name': None, 'se one, 'cem_addr_one': None, 'd_birth_date': None, 'd_death_date': None}

如您所见,标题字段(csv中的第一行)没有正确地关联到每个后续行。在

我是做错了什么,还是CSV质量差?在

感谢凯西问我是否在其他程序中打开了文件。Excel弄乱了文件….


Tags: csvnamenonedateonesuffixnumrow
3条回答

奇怪,我从你那里得到了不同的输出。在

数据.csv:

d_first_name,d_mid_name,d_last_name,d_suffix,d_birth_date,d_death_date,section_id,row_num,site_num,cem_name,cem_addr_one,cem_addr_two,city,state,zip,cem_url,cem_phone,relationship,v_first_name,v_mid_name,v_last_name,v_suffix,branch,rank,war "Emil","E","Seibel","","10/02/1920","03/12/2010","139-E","","3","HAWAII STATE VETERANS CEMETERY","KAMEHAMEHA HIGHWAY","","KANEOHE","HI","96744","","808-233-3630","Veteran (Self)","Emil","E","Seibel","","US ARMY","SGT","WORLD WAR II",

脚本:

for line in csv.DictReader(open('data.csv', 'rb'), delimiter=","):
    print line

输出:

^{pr2}$

csv.DictReader应该自动从文件中的第一行获取字段名,fieldnames参数被省略,as described in the docs。在

输出中的None: ['']是由每行数据上的尾随逗号引起的。在

工作代码示例:

http://codepad.org/HdBhr4La

只需尝试一下,它可以很好地处理您的文件(重命名为foo)

import csv

ifile  = open('foo.csv', "rb")
reader = csv.reader(ifile)

rownum = 0
for row in reader:
    # Save header row.
    if rownum == 0:
        header = row
    else:
        colnum = 0
        for col in row:
            print '%-8s: %s' % (header[colnum], col)
            colnum += 1

    rownum += 1

ifile.close()

输出=

^{pr2}$

看看我下载的原始文件here,它是有效的CSV。我把你脚本的输出弄错了。在

因为你用了csv.DictReader每一行都被转换成一个字典,其中标题值作为键,每个行的数据作为值。我在同一个文件中运行它,看起来一切都是正确匹配的,尽管我没有检查整个过程。在

根据python docs

class csv.DictReader(csvfile[, fieldnames=None[, restkey=None[, restval=None[, dialect='excel'[, *args, **kwds]]]]])

Create an object which operates like a regular reader but maps the information read into a dict whose keys are given by the optional fieldnames parameter. If the fieldnames parameter is omitted, the values in the first row of the csvfile will be used as the fieldnames. If the row read has more fields than the fieldnames sequence, the remaining data is added as a sequence keyed by the value of restkey. If the row read has fewer fields than the fieldnames sequence, the remaining keys take the value of the optional restval parameter. Any other optional or keyword arguments are passed to the underlying reader instance.

如果这不是您想要的格式,您可以尝试csv.reader它只为每一行返回一个列表,而不将其与标题相关联。在

要使用上面的听写器,这可能是您想要的:

import csv
reader = csv.DictReader(open('ngl_hawaii.csv', 'rb'), delimiter=','))
for row in reader:
    print row['d_first_name']
    print row['d_last_name']

相关问题 更多 >