使用python将嵌套的BigQuery数据导出到云存储

from google.cloud import bigquery client = bigquery.Client() os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "/Users/Nitin/Desktop/big_query_test/soy-serty-897-ed73.json" bucket_name = "soy-serty-897.appspot.com" project = "soy-serty-897" dataset_id = "analytics_157738" table_id = "events_20190326" destination_uri = 'gs://{}/{}'.format(bucket_name, 'basket.csv') dataset_ref = client.dataset(dataset_id, project=project) table_ref = dataset_ref.table(table_id) extract_job = client.extract_table( table_ref, destination_uri, # Location must match that of the source table. location='US') # API request extract_job.result() # Waits for job to complete. print('Exported {}:{}.{} to {}'.format( project, dataset_id, table_id, destination_uri))

2条回答

网友

1楼 · 编辑于 2024-10-04 07:32:32

现在无法测试，但这可能有用：

from google.cloud import bigquery as bq
ejc = bq.ExtractJobConfig()
ejc.destination_format='NEWLINE_DELIMITED_JSON'
extract_job = client.extract_table(
    table_ref,
    destination_uri,
    # Location must match that of the source table.
    location='US',
    job_config=ejc)  # API request

其思想是使用JSON而不是CSV，这样就可以支持嵌套数据。在

网友

2楼 · 编辑于 2024-10-04 07:32:32

在BigQueryexport limitations中，提到CSV不支持嵌套和重复的数据。因此，请尝试导出到Avro或JSON：

from google.cloud import bigquery
client = bigquery.Client()
bucket_name = 'your_bucket'
project = 'bigquery-public-data'
dataset_id = 'samples'
table_id = 'shakespeare'

destination_uri = 'gs://{}/{}'.format(bucket_name, '<your_file>')
dataset_ref = client.dataset(dataset_id, project=project)
table_ref = dataset_ref.table(table_id)
configuration = bigquery.job.ExtractJobConfig()
#For AVRO
#configuration.destination_format ='AVRO'
#For JSON
#configuration.destination_format ='NEWLINE_DELIMITED_JSON'

extract_job = client.extract_table(
table_ref,
destination_uri,
job_config=configuration,
location='US')
extract_job.result()

希望有帮助。在

相关问题更多 >

编程相关推荐

热门问题

热门文章