Python客户端支持在Amazon EMR上运行配置单元

2024-10-01 17:41:26 发布

您现在位置:Python中文网/ 问答频道 /正文

我注意到mrjob和boto都不支持在amazonlasticmapreduce(EMR)上提交和运行配置单元作业的Python接口。有没有其他Python客户机库支持在EMR上运行Hive?在


Tags: 客户作业boto单元机库hiveemrmrjob
1条回答
网友
1楼 · 发布于 2024-10-01 17:41:26

使用boto,您可以执行以下操作:

args1 = [u's3://us-east-1.elasticmapreduce/libs/hive/hive-script',
         u' base-path',
         u's3://us-east-1.elasticmapreduce/libs/hive/',
         u' install-hive',
         u' hive-versions',
         u'0.7']
args2 = [u's3://us-east-1.elasticmapreduce/libs/hive/hive-script',
         u' base-path',
         u's3://us-east-1.elasticmapreduce/libs/hive/',
         u' hive-versions',
         u'0.7',
         u' run-hive-script',
         u' args',
         u'-f',
         s3_query_file_uri]
steps = []
for name, args in zip(('Setup Hive','Run Hive Script'),(args1,args2)):
    step = JarStep(name,
                   's3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar',
                   step_args=args,
                   #action_on_failure="CANCEL_AND_WAIT"
                   )
    #should be inside loop
    steps.append(step)
# Kick off the job
jobid = EmrConnection().run_jobflow(name, s3_log_uri,
                                   steps=steps,
                                   master_instance_type=master_instance_type,
                                   slave_instance_type=slave_instance_type,
                                   num_instances=num_instances,
                                   hadoop_version="0.20")

相关问题 更多 >

    热门问题