在apachebeam管道中使用MatchFiles（）获取文件名并解析python中的json

file_content_pairs = (p | fileio.MatchFiles(known_args.input_bucket) | fileio.ReadMatches() | beam.Map(lambda file: (file.metadata.path, json.loads(file.read_utf8()))) | beam.ParDo(TestThis()) )

1条回答

网友

1楼 · 发布于 2024-04-27 17:12:47

我不明白。您想拥有(filename, json-parsed-contents)的键值对吗？在

如果是这样，您将：

file_content_pairs = (
  p | fileio.MatchFiles("gs://mybucketname/*.json")
    | fileio.ReadMatches()
    | beam.Map(lambda file: (file.metadata.path, json.loads(file.read_utf8()))
)

所以，如果你的文件是这样的：

^{pr2}$

然后，file_content_pairs集合将包含键值对("myfile.json", {"a":"b", "c": "d", "e": 1})。在

如果您的文件是json行格式，则应执行以下操作：

def consume_file(f):
  other_name = query_bigquery(f.metadata.path)
  return [(other_name, json.loads(line))
          for line in f.read_utf8().strip().split('\n')]

with Pipeline() as p:
  result = (p
            | fileio.MatchFiles("gs://mybucketname/*.json")
            | fileio.ReadMatches()
            | beam.FlatMap(consume_file))

相关问题更多 >

编程相关推荐

热门问题

热门文章

在apachebeam管道中使用MatchFiles（）获取文件名并解析python中的json

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >