从大cs创建词典列表

[{'value1': '20150302', 'value2': '20150225','value3': '5', 'IS_SHOP': '1', 'value4': '0', 'value5': 'GA321D01H-K12'}, {'value1': '20150302', 'value2': '20150225', 'value3': '1', 'value4': '0', 'value5': '1', 'value6': 'GA321D01H-K12'}]

1条回答

网友

1楼 · 发布于 2024-09-29 23:21:19

如果目标是从csv转换为avro，那么就没有理由存储输入值的完整列表。这违背了使用发电机的全部目的。它看起来像是，在设置了一个模式之后，^{}'s ^{} is designed to take an iterable and write it out one record at a time，所以您可以直接将它传递给生成器。例如，您的代码将简单地省略创建list的步骤（注意：命名变量list是个坏主意，因为它隐藏/隐藏了内置名称list），而直接编写生成器：

from fastavro import writer

def csv_reader():
    with open('export.csv') as f:
        reader = csv.DictReader(f)
        for row in reader:
            yield row

    # If this is Python 3.3+, you could simplify further to just:
    with open('export.csv') as f:
        yield from csv.DictReader(f)

# schema could be from the keys of the first row which gets manually written
# or you can provide an explicit schema with documentation for each field
schema = {...}  

with open('export.avro', 'wb') as out:
    writer(out, schema, csv_reader())

然后生成器一次生成一行，writer一次写入一行。写入后将丢弃输入行，因此内存使用量保持最小。在

如果您需要修改行，您应该在yield生成程序之前修改row生成器。在

相关问题更多 >

编程相关推荐

热门问题

热门文章