将CSV文件加载到BigQuery,其中一列包含所有空值

2024-10-06 12:22:34 发布

您现在位置:Python中文网/ 问答频道 /正文

我希望使用Python API将以下file附加到具有以下定义架构的BigQuery表中:

[
   {
    "name": "batsman",
    "type": "STRING",
    "mode": "NULLABLE"
   },
   {
    "name": "batting_team",
    "type": "STRING",
    "mode": "NULLABLE"
   },
   {
    "name": "bowler",
    "type": "STRING",
    "mode": "NULLABLE"
   },
   {
    "name": "city",
    "type": "STRING",
    "mode": "NULLABLE"
   },
   {
    "name": "date",
    "type": "DATE",
    "mode": "NULLABLE"
   },
   {
    "name": "delivery",
    "type": "FLOAT",
    "mode": "NULLABLE"
   },
   {
    "name": "extras",
    "type": "INTEGER",
    "mode": "NULLABLE"
   },
   {
    "name": "extras_type",
    "type": "STRING",
    "mode": "NULLABLE"
   },
   {
    "name": "inning",
    "type": "INTEGER",
    "mode": "NULLABLE"
   },
   {
    "name": "match_code",
    "type": "INTEGER",
    "mode": "NULLABLE"
   },
   {
    "name": "non_striker",
    "type": "STRING",
    "mode": "NULLABLE"
   },
   {
    "name": "player_out",
    "type": "STRING",
    "mode": "NULLABLE"
   },
   {
    "name": "runs",
    "type": "INTEGER",
    "mode": "NULLABLE"
   },
   {
    "name": "team1",
    "type": "STRING",
    "mode": "NULLABLE"
   },
   {
    "name": "team2",
    "type": "STRING",
    "mode": "NULLABLE"
   },
   {
    "name": "toss_decision",
    "type": "STRING",
    "mode": "NULLABLE"
   },
   {
    "name": "toss_winner",
    "type": "STRING",
    "mode": "NULLABLE"
   },
   {
    "name": "total",
    "type": "INTEGER",
    "mode": "NULLABLE"
   },
   {
    "name": "venue",
    "type": "STRING",
    "mode": "NULLABLE"
   },
   {
    "name": "wicket_fielders",
    "type": "STRING",
    "mode": "NULLABLE"
   },
   {
    "name": "wicket_kind",
    "type": "STRING",
    "mode": "NULLABLE"
   },
   {
    "name": "win_margin",
    "type": "INTEGER",
    "mode": "NULLABLE"
   },
   {
    "name": "win_type",
    "type": "STRING",
    "mode": "NULLABLE"
   },
   {
    "name": "winner",
    "type": "STRING",
    "mode": "NULLABLE"
   }
  ]

我用于附加到BigQuery的代码如下:

^{pr2}$

但是,每当我加载该文件时,我会收到一个错误消息:

BadRequest: 400 Invalid schema update. Field win_margin has changed type from INTEGER to STRING

普通文件看起来像this。在

我应该怎么做才能使win_margin列保持为INTEGER并且仍然能够加载包含列的所有空行的文件?在


Tags: 文件namemarginapiextrasstringmodetype
2条回答

您需要指定表模式,其中显式指定win_margin列的类型。您可以通过设置job_config.schema字段并将job_config.autodetect设置为False来完成。在

以下是可用于从文件中读取架构的函数:

def read_bigquery_schema_from_file(filepath):
    file_content = open(filepath).read()
    json_content = json.loads(file_content)
    return read_bigquery_schema_from_json_recursive(json_content)

def read_bigquery_schema_from_json_recursive(json_schema):
    """
    CAUTION: Recursive function
    This method can generate BQ schemas for nested records
    """
    result = []
    for field in json_schema:
        if field.get('type').lower() == 'record' and field.get('fields'):
            schema = SchemaField(
                name=field.get('name'),
                field_type=field.get('type', 'STRING'),
                mode=field.get('mode', 'NULLABLE'),
                description=field.get('description'),
                fields=read_bigquery_schema_from_json_recursive(field.get('fields'))
            )
        else:
            schema = SchemaField(
                name=field.get('name'),
                field_type=field.get('type', 'STRING'),
                mode=field.get('mode', 'NULLABLE'),
                description=field.get('description')
            )
        result.append(schema)
    return result

如您所见,BigQuery不允许向整数字段添加空值,因此您需要在创建文件或上载期间填充此字段,例如:

  1. 生成文件时,请确保win_margin不是空的put 0或 空
  2. 如果不可能,则需要将python代码更新为 上传前更新字段值
  3. 在工作表中创建一个公式来填充字段
  4. 将文件上载到BQ中的另一个表,然后运行SQL命令将数据从一个表移动到另一个表

相关问题 更多 >