AWS胶接E

import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job args = getResolvedOptions(sys.argv, ['JOB_NAME']) sc = SparkContext() glueContext = GlueContext(sc) spark = glueContext.spark_session job = Job(glueContext) job.init(args['JOB_NAME'], args) datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "testdb", table_name = "table1", transformation_ctx = "datasource0") datasource1 = glueContext.create_dynamic_frame.from_catalog(database = "testdb", table_name = "reftable", transformation_ctx = "datasource1") datasource2 =datasource1.join(["aaaaaaaaaid"],["aaaaaaaaaid"],datasource0,transformation_ctx="join") datasink2 = glueContext.write_dynamic_frame.from_options(frame = datasource2, connection_type = "s3", connection_options = {"path": "s3://testing/Output"}, format = "csv", transformation_ctx = "datasink2") job.commit()

2条回答

网友

1楼 · 编辑于 2024-09-28 05:29:03

我知道这很古老，只是提到了对我有用的方法。在

转换为数据帧
加入他们。在

如https://stackoverflow.com/a/54362245/8622986所述

希望这有帮助。在

网友

2楼 · 编辑于 2024-09-28 05:29:03

你能试试这个吗：

datasource2  = Join.apply(datasource0, datasource1, 'aaaaaaaaaid', 'aaaaaaaaaid')

这应该行得通。请告诉我，如果这有助于解决问题，也请接受/投票支持答案。在

问候

尤瓦

相关问题更多 >

编程相关推荐

热门问题

热门文章

AWS胶接E

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >