有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

使用Java将嵌套记录写入BigQuery

我想使用ApacheBeam将一些嵌套数据写入BigQuery,并想知道我为BigQuery表创建的模式是否正确。以下是我的数据在XML中的外观:

<ID>5<ID>
<Addresses>
    <Address>
        <Street>Lincoln St.</Street>
        <ZipCode>03483</ZipCode>
    </Address>
</Addresses>

以下是我创建BigQuery架构以反映上述数据的方式:

[{
    "name": "ID",
    "type": "STRING"
  },
  {
    "name": "Addresses",
    "type": "RECORD",
    "mode": "REPEATED",
    "fields": [
      {
        "name": "Address",
        "type": "RECORD",
        "mode": "REPEATED",
        "fields": [
          {
            "name": "Street",
            "type": "STRING"
          },
          {
            "name": "ZipCode",
            "type": "STRING"
          }
        ]
      }
    ]
  }]

这就是我如何解析上面的结构以在Java中创建一个BigQueryTableRow

List<Address> addresses = getAddresses;

if (!addresses.isEmpty()) {
    List<TableCell> repeatedRecordInstanceList = new ArrayList<>();

    for (Address address : addresses) {
        List<TableCell> childObject = new ArrayList<>();

        if (address.getStreet() != null) {
            childObject.add(new TableCell().set("Street", address.getStreet()));
        } else { childObject.add(new TableCell().set("Street", null)); }

        if (address.getZipCode() != null) {
            childObject.add(new TableCell().set("ZipCode", address.getZipCode()));
        } else { childObject.add(new TableCell().set("ZipCode", null)); }
     
        repeatedRecordInstanceList.add(new TableCell().set("Address", childObject));
    }
    tableRow.set("Addresses", repeatedRecordInstanceList);
} else {
        tableRow.set("Addresses", null);
  }

但出于某种原因,我的数据在BigQuery中是这样的:

^{tb1}$

似乎对于每个AddressStreetZipCode都是在两次迭代中编写的

我希望Street和它各自的ZipCode在同一行中,没有任何空值。我该怎么做?我希望能得到一些帮助。谢谢


共 (1) 个答案

  1. # 1 楼答案

    据我所知,您的代码中的结果对象可以是JSON格式的:

    {
      "ID" : "5",
      "Addresses" : [
        { "Address" : [{"Street" : "abc", "ZipCode": "1564"},
                       {"Street" : "abd", "ZipCode": "1565"}]
        },
        {"Address" : [{"Street" : "abe", "ZipCode": "1566"},
                      {"Street" : "abf", "ZipCode": "1567"},
                      {"Street" : "abg", "ZipCode": "1568"}]
        }
      ]
    }
    

    我认为你不希望这样——在“地址”中可以有多个地址,然后在“地址”中有多个“地址”。我认为“Address”不应该是mode{}(这意味着它是一个数组)。这也意味着childObject不应该是ArrayList,因为如果添加新元素,那么就向数组添加一个新条目,不是吗