从bigquery卸载表到Google cloud storag的速度非常慢

scope = ["https://www.googleapis.com/auth/bigquery"] project_id='txxxxxxx9' dataset_id = 'newdataset' table_id = 'newtable2' with open('/home/xxxxxxx/Dropbox/access_keys/google_storage/xxxxxxxx.json') as auth_file: key = json.load(auth_file) client_email = key['client_email'] pv_key = key['private_key'] credentials = SignedJwtAssertionCredentials(client_email, pv_key, scope=scope) bigquery_service = build('bigquery', 'v2', credentials=credentials) job_data = { 'jobReference': { 'projectId': project_id, 'jobId': str(uuid.uuid4()) }, 'configuration': { 'extract': { 'sourceTable': { 'projectId': project_id, 'datasetId': dataset_id, 'tableId': table_id, }, 'destinationUris': ['gs://xxxxxxx/test.csv'], 'destinationFormat': 'CSV' } } } query_job = bigquery_service.jobs().insert(projectId=project_id, body=job_data).execute()

1条回答

网友

1楼 · 发布于 2024-10-04 01:23:03

按照你制定请求的方式，它是在一个worker中编写一个300mbcsv文件。这将相当缓慢。（5分钟仍然比我预期的长，但在合理的范围内）

如果在目的地URI中使用glob模式（例如gs://xxxxxxx/test*.csv），那么它应该更快，因为它可以并行进行。在

相关问题更多 >

编程相关推荐

热门问题

热门文章