如何从inFile读取头文件并将头文件写入outFile?(Python3)

2024-06-02 17:48:58 发布

您现在位置:Python中文网/ 问答频道 /正文

我在文件read中使用next是为了解析数据而不是头行,因此read完全避免了第一行。如何在读操作(而不是解析标题行)中提取标题,然后在写操作中写入标题?你知道吗

我要对的实际数据集是30col和80k行,所以我尝试在一次读取操作中完成这个操作。你知道吗

试验数据:

date, animal, color
3/14/2015, cat, blue
3/24/2015, dog, green

代码:

from dateutil.parser import *
import csv

with open('testin.csv', 'r', encoding='utf-8') as inFile, open('testout.csv', 'w', encoding='utf-8') as outFile:
    exampleReader = csv.reader(inFile)
    next(exampleReader, 1)
    exampleData = list(exampleReader)
    exampleWriter = csv.writer(outFile)
    # print a few to see what it's doing
    print('the list', exampleData)
    for item in exampleData:
        item[0] = str(parse(item[0])) # converting date format for sqlite
        del item[2] # dropping column that is not needed
        print('date corrected', item) 
        exampleWriter.writerow(item)

Tags: csv数据import标题readdateasopen
2条回答

在处理输入文件的其余部分之前写入头文件:

from dateutil.parser import parse
import csv

with open('testin.csv', 'r', encoding='utf-8') as inFile, open('testout.csv', 'w', encoding='utf-8') as outFile:
    exampleReader = csv.reader(inFile)
    header = next(exampleReader)

    exampleWriter = csv.writer(outFile)
    del header[2]    # drop the column from the header
    exampleWriter.writerow(header)

    for row in exampleReader:
        row[0] = parse(row[0]) # converting date format for sqlite
        del row[2] # dropping column that is not needed
        print('date corrected', row) 
        exampleWriter.writerow(row)

我重新安排了一些事情,但是,主要的一点是将头读入带有next()的变量,从头中删除不需要的列,然后将其写入输出文件。然后处理输入文件的其余部分。你知道吗

重要的一点是,输入文件的其余部分在for循环中逐行处理。当您可以对整个文件进行迭代时,不必预先将其读入列表。你知道吗

还可以使用生成器表达式高效地写入行:

from dateutil.parser import parse
import csv

def process_row(row, is_header=False):
    if not is_header:
        row[0] = parse(row[0])
    del row[2]
    return row

with open('data', 'r', encoding='utf-8') as inFile, open('testout.csv', 'w', encoding='utf-8') as outFile:
    exampleReader = csv.reader(inFile)
    header = next(exampleReader)

    exampleWriter = csv.writer(outFile)
    exampleWriter.writerow(process_row(header, is_header=True))

    exampleWriter.writerows(process_row(row) for row in exampleReader)

我会使用pandas来处理这样大量的数据:

import io
import pandas as pd

data = """\
date, animal, color, junk
3/14/2015, cat, blue, aaa
3/24/2015, dog, green, bbb
"""
num_cols = 4
all_cols = set(range(num_cols))
skip_cols = set([2,3])

# replace `io.StringIO(data)` with the CSV filename    
df = pd.read_csv(io.StringIO(data),
                 sep=',',
                 skipinitialspace=True,
                 parse_dates=[0],
                 usecols=(all_cols - skip_cols))
print(df)

# save DF as CSV file
df.to_csv('/path/to/new.csv', index=False)

# save DF to SQLite DB
import sqlalchemy
engine = sqlalchemy.create_engine('sqlite:///my_db.sqlite')
df.to_sql('my_table', engine, if_exists='replace')

示例:

In [150]: data = """\
   .....: date, animal, color, junk
   .....: 3/14/2015, cat, blue, aaa
   .....: 3/24/2015, dog, green, bbb
   .....: """

In [151]: num_cols = 4

In [152]: all_cols = set(range(num_cols))

In [153]: skip_cols = set([2,3])

In [154]: df = pd.read_csv(io.StringIO(data),
   .....:                  sep=',',
   .....:                  skipinitialspace=True,
   .....:                  parse_dates=['date'],
   .....:                  usecols=(all_cols - skip_cols))

In [155]: print(df)
        date animal
0 2015-03-14    cat
1 2015-03-24    dog

相关问题 更多 >