如何使用csv.DictReader读取存储在S3中的csv？

import boto3, csv session = boto3.session.Session(aws_access_key_id=<>, aws_secret_access_key=<>, region_name=<>) s3_resource = session.resource('s3') s3_object = s3_resource.Object(<bucket>, <key>) streaming_body = s3_object.get()['Body'] #csv.DictReader(???)

1条回答

网友

1楼 · 发布于 2024-06-01 12:08:08

代码如下：

import boto3
import csv

# get a handle on s3
s3 = boto3.resource(u's3')

# get a handle on the bucket that holds your file
bucket = s3.Bucket(u'bucket-name')

# get a handle on the object you want (i.e. your file)
obj = bucket.Object(key=u'test.csv')

# get the object
response = obj.get()

# read the contents of the file and split it into a list of lines

# for python 2:
lines = response[u'Body'].read().split()

# for python 3 you need to decode the incoming bytes:
lines = response['Body'].read().decode('utf-8').split()

# now iterate over those lines
for row in csv.DictReader(lines):

    # here you get a sequence of dicts
    # do whatever you want with each line here
    print(row)

您可以在实际代码中对此进行压缩，但我试图一步一步地使用boto3来显示对象层次结构。

根据您关于避免将整个文件读入内存的评论进行编辑：我没有遇到过这样的要求，因此无法进行权威性发言，但我会尝试包装流，以便获得类似迭代器的文本文件。例如，可以使用codecs库将上面的csv解析部分替换为如下内容：

for row in csv.DictReader(codecs.getreader('utf-8')(response[u'Body'])):
    print(row)

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何使用csv.DictReader读取存储在S3中的csv？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >