如何从googledataflow中的PCollection中获取元素列表并在管道中使用它来循环写入转换？

class ToUniqueList(beam.CombineFn): def create_accumulator(self): return [] def add_input(self, accumulator, element): if element not in accumulator: accumulator.append(element) return accumulator def merge_accumulators(self, accumulators): return list(set(accumulators)) def extract_output(self, accumulator): return accumulator def get_list_of_dates(pcoll): return (pcoll | 'get the list of dates' >> beam.CombineGlobally(ToUniqueList()))

1条回答

网友

1楼 · 发布于 2024-10-01 13:26:07

直接获取PCollection的内容是不可能的——apachebeam或数据流管道更像是一个关于应该进行什么处理的查询计划，PCollection是计划中的逻辑中间节点，而不是包含数据。主程序汇编计划（管道）并启动它。在

但是，最终您将尝试将数据写入按日期分片的BigQuery表。此用例目前仅支持in the Java SDK，并且仅支持流式管道。在

对于根据数据向多个目的地写入数据的更一般的处理方法，请遵循BEAM-92。在

另请参见Creating/Writing to Parititoned BigQuery table via Google Cloud Dataflow

相关问题更多 >

编程相关推荐

热门问题

热门文章