我用pandas上的gbq更新googlebigquery并获取GenericGBQException

2024-10-02 10:31:24 发布

您现在位置:Python中文网/ 问答频道 /正文

在尝试使用to_gbq更新Google BigQuery表时,我得到的响应是:

GenericGBQException: Reason: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1.

我的代码:

^{pr2}$

我的mini_df数据框看起来像:

date    request_number  name    feature_name    value_name  value
2018-01-10  1   1   "a" "b" 0.309457
2018-01-10  1   1   "c" "d" 0.273748

当我运行to_gbq时,并且BigQuery上没有表,我可以看到该表是用下一个模式创建的:

日期字符串可为空
请求编号字符串可为空
名称字符串可为空
feature_name STRING可为空
value_name STRING可为空
值浮点可为空

我做错什么了?我怎么解决这个问题?在

另外,其他例外情况:

BadRequest                                Traceback (most recent call last)
~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in load_data(self, dataframe, dataset_id, table_id, chunksize)
    589                         destination_table,
--> 590                         job_config=job_config).result()
    591                 except self.http_error as ex:

~/anaconda3/envs/env/lib/python3.6/site-packages/google/cloud/bigquery/job.py in result(self, timeout)
    527         # TODO: modify PollingFuture so it can pass a retry argument to done().
--> 528         return super(_AsyncJob, self).result(timeout=timeout)
    529 

~/anaconda3/envs/env/lib/python3.6/site-packages/google/api_core/future/polling.py in result(self, timeout)
    110             # Pylint doesn't recognize that this is valid in this case.
--> 111             raise self._exception
    112 

BadRequest: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1.

During handling of the above exception, another exception occurred:

GenericGBQException                       Traceback (most recent call last)
<ipython-input-28-195df93249b6> in <module>()
----> 1 gbq.to_gbq(mini_df,'Name-of-Table','Project-id',chunksize=10000,reauth=False,if_exists='append',private_key=None)

~/anaconda3/envs/env/lib/python3.6/site-packages/pandas/io/gbq.py in to_gbq(dataframe, destination_table, project_id, chunksize, verbose, reauth, if_exists, private_key)
    106                       chunksize=chunksize,
    107                       verbose=verbose, reauth=reauth,
--> 108                       if_exists=if_exists, private_key=private_key)

~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in to_gbq(dataframe, destination_table, project_id, chunksize, verbose, reauth, if_exists, private_key, auth_local_webserver)
    987         table.create(table_id, table_schema)
    988 
--> 989     connector.load_data(dataframe, dataset_id, table_id, chunksize)
    990 
    991 

~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in load_data(self, dataframe, dataset_id, table_id, chunksize)
    590                         job_config=job_config).result()
    591                 except self.http_error as ex:
--> 592                     self.process_http_error(ex)
    593 
    594                 rows = []

~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in process_http_error(ex)
    454         # <https://cloud.google.com/bigquery/troubleshooting-errors>`__
    455 
--> 456         raise GenericGBQException("Reason: {0}".format(ex))
    457 
    458     def run_query(self, query, **kwargs):

GenericGBQException: Reason: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1.

Tags: inpyselfenvidlibpackagestable
1条回答
网友
1楼 · 发布于 2024-10-02 10:31:24

我也有同样的问题。在

在我的例子中,它依赖于数据帧的数据类型object。在

我有三列externalIdmappingIdinfo。对于这些字段,我设置了一个数据类型,让熊猫来做,这很神奇。在

它决定将所有三个列数据类型都设置为object。问题是,to_gbq组件在内部使用to_json组件。由于某种原因,如果字段的类型是object,但只包含数值,则此输出会省略数据字段周围的引号。在

所以Google Big Query需要这个

{"externalId": "12345", "mappingId":"abc123", "info":"blerb"}

但得到了这个:

^{pr2}$

而且由于在googlebigquery中字段的映射是STRING,导入过程失败。在

有两个解决方案。在

解决方案1-更改列的数据类型

简单的类型转换有助于解决此问题。我还必须将bigquery中的数据类型更改为INTEGER。在

df['externalId'] = df['externalId'].astype('int')

如果是这样,大查询可以像JSON标准所说的那样使用没有引号的字段。在

解决方案2-确保字符串字段是字符串

同样,这是在设置数据类型。但是,由于我们显式地将其设置为String,带有to_json的导出将输出一个带引号的字段,一切正常。在

df['externalId'] = df['externalId'].astype('str')

相关问题 更多 >

    热门问题