产生谷歌的MediaIoBaseDownload内部响应

2024-09-30 08:16:06 发布

您现在位置:Python中文网/ 问答频道 /正文

我编写了一个小烧瓶应用程序,从谷歌硬盘下载文件

@app.route("/downloadFile/<id>")
def downloadFile(id):
    ioBytes, name, mime = gdrive.downloadFile(id)
    return send_file(ioBytes, mime, True, name)

我使用了示例here中的下载方法,做了一些小改动

def downloadFile(self, file_id):
        file = self.drive.files().get(fileId=file_id).execute()
        request = self.drive.files().get_media(fileId=file_id)
        fh = io.BytesIO()
        downloader = MediaIoBaseDownload(fh, request)
        done = False
        while done is False:
            status, done = downloader.next_chunk()
            print("Downloading {} - {}%".format(file.get('name'), int(status.progress() * 100)))
        fh.seek(0)
        return (fh, file.get('name'), file.get('mimeType'))

它按预期工作,并将文件下载到我的计算机上

现在,我想将这个Flask应用程序部署到Heroku。我的问题是HTTP超时,如here所述:

HTTP requests have an initial 30 second window in which the web process must return response data

由于我的一些文件可能需要超过30秒才能下载,这最终会成为一个大问题

我尝试使用Response类和yield语句不断发送空字节,直到使用以下函数下载并发送文件:

def sendUntilEndOfRequest(func, args=()):
    def thread():
        with app.app_context(), app.test_request_context():
            return func(*args)
    
    with concurrent.futures.ThreadPoolExecutor() as executor:
        ret = ""
        def exec():
            while ret == "":
                yield ""
                time.sleep(1)
            yield ret
        future = executor.submit(thread)
        def getValue():
            nonlocal ret
            ret = future.result()
        threading.Thread(target=getValue).start()
        return Response(stream_with_context(exec()))

我试着让它有点通用,这样如果我有任何其他需要30秒以上才能执行的函数,我就可以使用它

现在,我的下载代码是

@app.route("/downloadFile/<id>")
def downloadFile(id):
    def downloadAndSendFile():
        ioBytes, name, mime = gdrive.downloadFile(id)
        return send_file(ioBytes, mime, True, name)
    return sendUntilEndOfRequest(downloadAndSendFile)

但每次我尝试运行此代码时,都会出现以下错误:

127.0.0.1 - - [15/Jan/2020 20:38:06] "[37mGET /downloadFile/1heeoEBZrhW0crgDSLbhLpcyMfvXqSmqi HTTP/1.1[0m" 200 -
Error on request:
Traceback (most recent call last):
  File "C:\Users\fsvic\AppData\Local\Programs\Python\Python37\lib\site-packages\werkzeug\serving.py", line 303, in run_wsgi
    execute(self.server.app)
  File "C:\Users\fsvic\AppData\Local\Programs\Python\Python37\lib\site-packages\werkzeug\serving.py", line 294, in execute
    write(data)
  File "C:\Users\fsvic\AppData\Local\Programs\Python\Python37\lib\site-packages\werkzeug\serving.py", line 274, in write
    assert isinstance(data, bytes), "applications must write bytes"
AssertionError: applications must write bytes

显然,文件下载正确。我测试了用render_template命令替换send_file,以检查是否可以生成flask对象,并且它工作得很好。我还测试了返回字符串,它也很有效

最后,如何检索我下载的文件


Tags: 文件nameselfidappgetreturnrequest
1条回答
网友
1楼 · 发布于 2024-09-30 08:16:06

MediaIoBaseDownload所做的一切就是调用文件处理程序的write方法。 因此,您可以像这样实现自己的IO:

import io

from googleapiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools
from googleapiclient.http import MediaIoBaseDownload

from flask import Flask
from flask import Response

app = Flask(__name__)


SCOPES = 'https://www.googleapis.com/auth/drive.readonly'
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
    flow = client.flow_from_clientsecrets('client_id.json', SCOPES)
    creds = tools.run_flow(flow, store)
drive_service = discovery.build('drive', 'v3', http=creds.authorize(Http()))


class ChunkHolder(object):

    def __init__(self):
        self.chunk = None

    def write(self, chunk):
        """Save current chunk"""
        self.chunk = chunk


@app.route('/<file_id>')
def download_file(file_id):
    request = drive_service.files().get_media(fileId=file_id)

    def download_stream():
        done = False
        fh = ChunkHolder()
        downloader = MediaIoBaseDownload(fh, request)
        while not done:
            status, done = downloader.next_chunk()
            print("Download %d%%." % int(status.progress() * 100))
            yield fh.chunk

    return Response(download_stream())


if __name__ == '__main__':
    app.run(port=5000)

我们在下载后立即生成已下载的块,并且不在内存中保留以前的块

相关问题 更多 >

    热门问题