将CSV文件加载到BigQuery，其中一列包含所有空值

[ { "name": "batsman", "type": "STRING", "mode": "NULLABLE" }, { "name": "batting_team", "type": "STRING", "mode": "NULLABLE" }, { "name": "bowler", "type": "STRING", "mode": "NULLABLE" }, { "name": "city", "type": "STRING", "mode": "NULLABLE" }, { "name": "date", "type": "DATE", "mode": "NULLABLE" }, { "name": "delivery", "type": "FLOAT", "mode": "NULLABLE" }, { "name": "extras", "type": "INTEGER", "mode": "NULLABLE" }, { "name": "extras_type", "type": "STRING", "mode": "NULLABLE" }, { "name": "inning", "type": "INTEGER", "mode": "NULLABLE" }, { "name": "match_code", "type": "INTEGER", "mode": "NULLABLE" }, { "name": "non_striker", "type": "STRING", "mode": "NULLABLE" }, { "name": "player_out", "type": "STRING", "mode": "NULLABLE" }, { "name": "runs", "type": "INTEGER", "mode": "NULLABLE" }, { "name": "team1", "type": "STRING", "mode": "NULLABLE" }, { "name": "team2", "type": "STRING", "mode": "NULLABLE" }, { "name": "toss_decision", "type": "STRING", "mode": "NULLABLE" }, { "name": "toss_winner", "type": "STRING", "mode": "NULLABLE" }, { "name": "total", "type": "INTEGER", "mode": "NULLABLE" }, { "name": "venue", "type": "STRING", "mode": "NULLABLE" }, { "name": "wicket_fielders", "type": "STRING", "mode": "NULLABLE" }, { "name": "wicket_kind", "type": "STRING", "mode": "NULLABLE" }, { "name": "win_margin", "type": "INTEGER", "mode": "NULLABLE" }, { "name": "win_type", "type": "STRING", "mode": "NULLABLE" }, { "name": "winner", "type": "STRING", "mode": "NULLABLE" } ]

2条回答

网友

1楼 · 编辑于 2024-10-06 12:22:34

您需要指定表模式，其中显式指定win_margin列的类型。您可以通过设置job_config.schema字段并将job_config.autodetect设置为False来完成。在

以下是可用于从文件中读取架构的函数：

def read_bigquery_schema_from_file(filepath):
    file_content = open(filepath).read()
    json_content = json.loads(file_content)
    return read_bigquery_schema_from_json_recursive(json_content)

def read_bigquery_schema_from_json_recursive(json_schema):
    """
    CAUTION: Recursive function
    This method can generate BQ schemas for nested records
    """
    result = []
    for field in json_schema:
        if field.get('type').lower() == 'record' and field.get('fields'):
            schema = SchemaField(
                name=field.get('name'),
                field_type=field.get('type', 'STRING'),
                mode=field.get('mode', 'NULLABLE'),
                description=field.get('description'),
                fields=read_bigquery_schema_from_json_recursive(field.get('fields'))
            )
        else:
            schema = SchemaField(
                name=field.get('name'),
                field_type=field.get('type', 'STRING'),
                mode=field.get('mode', 'NULLABLE'),
                description=field.get('description')
            )
        result.append(schema)
    return result

网友

2楼 · 编辑于 2024-10-06 12:22:34

如您所见，BigQuery不允许向整数字段添加空值，因此您需要在创建文件或上载期间填充此字段，例如：

生成文件时，请确保win_margin不是空的put 0或空
如果不可能，则需要将python代码更新为上传前更新字段值
在工作表中创建一个公式来填充字段
将文件上载到BQ中的另一个表，然后运行SQL命令将数据从一个表移动到另一个表

相关问题更多 >

编程相关推荐

热门问题

热门文章