我用pandas上的gbq更新googlebigquery并获取GenericGBQException

BadRequest Traceback (most recent call last) ~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in load_data(self, dataframe, dataset_id, table_id, chunksize) 589 destination_table, --> 590 job_config=job_config).result() 591 except self.http_error as ex: ~/anaconda3/envs/env/lib/python3.6/site-packages/google/cloud/bigquery/job.py in result(self, timeout) 527 # TODO: modify PollingFuture so it can pass a retry argument to done(). --> 528 return super(_AsyncJob, self).result(timeout=timeout) 529 ~/anaconda3/envs/env/lib/python3.6/site-packages/google/api_core/future/polling.py in result(self, timeout) 110 # Pylint doesn't recognize that this is valid in this case. --> 111 raise self._exception 112 BadRequest: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. During handling of the above exception, another exception occurred: GenericGBQException Traceback (most recent call last) <ipython-input-28-195df93249b6> in <module>() ----> 1 gbq.to_gbq(mini_df,'Name-of-Table','Project-id',chunksize=10000,reauth=False,if_exists='append',private_key=None) ~/anaconda3/envs/env/lib/python3.6/site-packages/pandas/io/gbq.py in to_gbq(dataframe, destination_table, project_id, chunksize, verbose, reauth, if_exists, private_key) 106 chunksize=chunksize, 107 verbose=verbose, reauth=reauth, --> 108 if_exists=if_exists, private_key=private_key) ~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in to_gbq(dataframe, destination_table, project_id, chunksize, verbose, reauth, if_exists, private_key, auth_local_webserver) 987 table.create(table_id, table_schema) 988 --> 989 connector.load_data(dataframe, dataset_id, table_id, chunksize) 990 991 ~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in load_data(self, dataframe, dataset_id, table_id, chunksize) 590 job_config=job_config).result() 591 except self.http_error as ex: --> 592 self.process_http_error(ex) 593 594 rows = [] ~/anaconda3/envs/env/lib/python3.6/site-packages/pandas_gbq/gbq.py in process_http_error(ex) 454 # <https://cloud.google.com/bigquery/troubleshooting-errors>`__ 455 --> 456 raise GenericGBQException("Reason: {0}".format(ex)) 457 458 def run_query(self, query, **kwargs): GenericGBQException: Reason: 400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1.

1条回答

网友

1楼 · 发布于 2024-10-02 10:31:24

我也有同样的问题。在

在我的例子中，它依赖于数据帧的数据类型object。在

我有三列externalId，mappingId，info。对于这些字段，我设置了一个数据类型，让熊猫来做，这很神奇。在

它决定将所有三个列数据类型都设置为object。问题是，to_gbq组件在内部使用to_json组件。由于某种原因，如果字段的类型是object，但只包含数值，则此输出会省略数据字段周围的引号。在

所以Google Big Query需要这个

{"externalId": "12345", "mappingId":"abc123", "info":"blerb"}

但得到了这个：

^{pr2}$

而且由于在googlebigquery中字段的映射是STRING，导入过程失败。在

有两个解决方案。在

解决方案1-更改列的数据类型

简单的类型转换有助于解决此问题。我还必须将bigquery中的数据类型更改为INTEGER。在

df['externalId'] = df['externalId'].astype('int')

如果是这样，大查询可以像JSON标准所说的那样使用没有引号的字段。在

解决方案2-确保字符串字段是字符串

同样，这是在设置数据类型。但是，由于我们显式地将其设置为String，带有to_json的导出将输出一个带引号的字段，一切正常。在

df['externalId'] = df['externalId'].astype('str')

相关问题更多 >

编程相关推荐

热门问题

热门文章