回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>在apachebeam中,在Pcollection之后左连接哪个更好?在</p>
<pre><code>pcoll1 = [('key1', [[('a', 1)],[('b', 2)], [('c', 3)], [('d', 4)],[('e', 5)], [('f', 6)]]), ('key2',[[('a', 12)],[('b', 21)], [('c', 13)]]), ('key3',[[('a', 21)],[('b', 23)], [('c', 31)]])]
pcoll2 = [('key1', [[('x', 10)]]), ('key2', [[('x', 20)]])]
</code></pre>
<p>预期输出为</p>
^{pr2}$
<p>我使用CoGroupByKey()和Pardo()实现了一个左joiner。在beampythonsdk中有没有其他实现左joiner的方法?在</p>
<pre><code>left_joined = (
{'left': pcoll1, 'right': pcoll2}
| 'LeftJoiner: Combine' >> beam.CoGroupByKey()
| 'LeftJoiner: ExtractValues' >> beam.Values()
| 'LeftJoiner: JoinValues' >> beam.ParDo(LeftJoinerFn())
)
class LeftJoinerFn(beam.DoFn):
def __init__(self):
super(LeftJoinerFn, self).__init__()
def process(self, row, **kwargs):
left = row['left']
right = row['right']
if left and right:
for each in left:
yield each + right[0]
elif left:
for each in left:
yield each
</code></pre>