使用预先签名的URL上载到S3时显示进度条

2024-05-19 22:26:43 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试使用一个预先签名的URL在我的s3 bucket中上传一个文件,它工作得很好,并且成功地将数据上传到bucket,但是,我上传的文件非常大,我需要能够显示进度条。我在StackOverflow和其他博客文章中尝试了许多解决方案,但似乎没有任何效果

下面是使用预签名URL将数据上载到s3的代码片段

object_name = 'DataSet.csv'
response = create_presigned_post("mybucket_name",object_name)

fields = response['fields']
with open(object_name, 'rb') as f:
    files = {'file': (object_name, f)}
    http_response = requests.post(response['url'], data=fields, files=files,stream=True)

print (http_response.status_code)

它返回成功上传的204状态

现在,我可以对这段代码进行哪些更改以显示进度条

附言 我在请求中尝试了stream=True,但请求无效。 我曾尝试使用tqdm对响应进行迭代,但在这种情况下也不起作用


Tags: 文件数据代码进度条namehttpurlfields
2条回答

我认为没有办法通过使用默认协议HTTP POSTrequest的presignedUrl上传一个大文件。 您可以通过使用AWSS3的多部分上传机制来实现这一点。通过这种方式,您可以了解上传的每个部分,并据此计算进度。 我创建了一篇文章,其中包含使用多部分上传和presignedUrl(typescript)https://www.altostra.com/blog/multipart-uploads-with-s3-presigned-url的代码片段

下面的代码可以很好地用于Python,我发现它here

import logging
import argparse

from boto3 import Session
import requests


logging.basicConfig()
logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)


class S3MultipartUploadUtil:
    """
    AWS S3 Multipart Upload Uril
    """
    def __init__(self, session: Session):
        self.session = session
        self.s3 = session.client('s3')
        self.upload_id = None
        self.bucket_name = None
        self.key = None

    def start(self, bucket_name: str, key: str):
        """
        Start Multipart Upload
        :param bucket_name:
        :param key:
        :return:
        """
        self.bucket_name = bucket_name
        self.key = key
        res = self.s3.create_multipart_upload(Bucket=bucket_name, Key=key)
        self.upload_id = res['UploadId']
        logger.debug(f"Start multipart upload '{self.upload_id}'")

    def create_presigned_url(self, part_no: int, expire: int=3600) -> str:
        """
        Create pre-signed URL for upload part.
        :param part_no:
        :param expire:
        :return:
        """
        signed_url = self.s3.generate_presigned_url(
            ClientMethod='upload_part',
            Params={'Bucket': self.bucket_name,
                    'Key': self.key,
                    'UploadId': self.upload_id,
                    'PartNumber': part_no},
            ExpiresIn=expire)
        logger.debug(f"Create presigned url for upload part '{signed_url}'")
        return signed_url

    def complete(self, parts):
        """
        Complete Multipart Uploading.
        `parts` is list of dictionary below.
        ```
        [ {'ETag': etag, 'PartNumber': 1}, {'ETag': etag, 'PartNumber': 2}, ... ]
        ```
        you can get `ETag` from upload part response header.
        :param parts: Sent part info.
        :return:
        """
        res = self.s3.complete_multipart_upload(
            Bucket=self.bucket_name,
            Key=self.key,
            MultipartUpload={
                'Parts': parts
            },
            UploadId=self.upload_id
        )
        logger.debug(f"Complete multipart upload '{self.upload_id}'")
        logger.debug(res)
        self.upload_id = None
        self.bucket_name = None
        self.key = None


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('target_file')
    parser.add_argument(' bucket', required=True)
    args = parser.parse_args()

    target_file = Path(args.target_file)
    bucket_name = args.bucket
    key = target_file.name
    max_size = 5 * 1024 * 1024

    file_size = target_file.stat().st_size
    upload_by = int(file_size / max_size) + 1

    session = Session()
    s3util = S3MultipartUploadUtil(session)

    s3util.start(bucket_name, key)
    urls = []
    for part in range(1, upload_by + 1):
        signed_url = s3util.create_presigned_url(part)
        urls.append(signed_url)

    parts = []
    with target_file.open('rb') as fin:
        for num, url in enumerate(urls):
            part = num + 1
            file_data = fin.read(max_size)
            print(f"upload part {part} size={len(file_data)}")
            res = requests.put(url, data=file_data)
            print(res)
            if res.status_code != 200:
                return
            etag = res.headers['ETag']
            parts.append({'ETag': etag, 'PartNumber': part})

    print(parts)
    s3util.complete(parts)


if __name__ == '__main__':
    main()

相关问题 更多 >