在GCS中使用云函数将多个文件合并或连接成一个文件

2024-10-02 10:19:37 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图重新创建这个脚本,将合并到一个csv文件的gcs模式相同的csv文件,但我不能让它工作。我总是出错。你知道吗

我有3个文件,即position1.csv,position2.csv,position3.csv,我的bucket名是gatest和subfolder Extract。你知道吗

错误消息: 错误:函数终止。建议操作:检查日志以了解终止原因。细节: '姓名'

import google.cloud.storage.client as gcs
import logging


def compose_shards(data, context):

    num_shards = 3

    prefix = 'Extract/position' 
    outfile = 'Extract/full_position_data.csv'
    filename = data['name']    #keep getting error here with only 'name', what should be the expected value here? 
    last_shard = '-%05d-of-%05d' % (num_shards - 1, num_shards)
    if (prefix in filename and last_shard in filename):
        prefix = filename.replace(last_shard,'')
        client = gcs.Client()
        bucket = client.bucket(data['bucket']) #i tried replacing bucket with my gcs bucket name but it didnt work, also having error here
        blobs = []
    for shard in range (num_shards):
        sfile = '%s-%05d-of-%05d' % (prefix, shard + 1, num_shards)
        blob = bucket.blob(sfile)
        if not blob.exists():
            raise ValueError('Shard {} not present'.format(sfile))
        blobs.append(blob)

    bucket.blob(outfile).compose(blobs)
    logging.info('Successfully created {}'.format(outfile))
    for blob in blobs:
        blob.delete()
    logging.info('Deleted {} shards'.format(len(blobs)))


Tags: 文件csvinclientdataprefixbucketextract

热门问题