EMR 集群创建在步骤中失败

Posted

技术标签:

【中文标题】EMR 集群创建在步骤中失败【英文标题】:EMR Cluster creation fails on the step 【发布时间】:2019-11-21 20:45:59 【问题描述】:

我第一次尝试使用 Lambda 函数创建 EMR 集群失败并出现以下错误。我打算使用 script-runner.jar 来启动位于 S3 存储桶中的 python 脚本。有人可以帮我理解这个错误吗?我到底错过了什么?

2019-11-21T20:34:59.990Z INFO Ensure step 1 jar file s3a://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar
INFO Failed to download: s3a://<region>.elasticmapreduce/libs/script-runner/script-runner.jar
java.io.IOException: Unable to download 's3a://<region>.elasticmapreduce/libs/script-runner/script-runner.jar'. Only s3 + local files are supported
    at aws157.instancecontroller.util.S3Wrapper.fetchHadoopFileToLocal(S3Wrapper.java:353)
    at aws157.instancecontroller.master.steprunner.HadoopJarStepRunner$Runner.<init>(HadoopJarStepRunner.java:243)
    at aws157.instancecontroller.master.steprunner.HadoopJarStepRunner.createRunner(HadoopJarStepRunner.java:152)
    at aws157.instancecontroller.master.steprunner.HadoopJarStepRunner.createRunner(HadoopJarStepRunner.java:146)
    at aws157.instancecontroller.master.steprunner.StepExecutor.runStep(StepExecutor.java:136)
    at aws157.instancecontroller.master.steprunner.StepExecutor.run(StepExecutor.java:70)
    at aws157.instancecontroller.master.steprunner.StepExecutionManager.enqueueStep(StepExecutionManager.java:246)
    at aws157.instancecontroller.master.steprunner.StepExecutionManager.doRun(StepExecutionManager.java:193)
    at aws157.instancecontroller.master.steprunner.StepExecutionManager.access$000(StepExecutionManager.java:33)
    at aws157.instancecontroller.master.steprunner.StepExecutionManager$1.run(StepExecutionManager.java:94)

我写得很松散的 lambda 函数如下:

#!/usr/bin/python
# -*- coding: utf-8 -*-
import json
import boto3
import datetime


def lambda_handler(event, context):
    print ('Creating EMR')
    connection = boto3.client('emr', region_name='us-east-1')
    print (event)

    cluster_id = connection.run_job_flow(
        Name='MyTest',
        VisibleToAllUsers=True,
        JobFlowRole='EMR_EC2_DefaultRole',
        ServiceRole='EMR_DefaultRole',
        LogUri='s3://bucket-emr/logs',
        ReleaseLabel='emr-5.21.0',
        Applications=['Name': 'Hadoop', 'Name': 'Spark'],
        Instances=
            'InstanceGroups': [
                'Name': 'Master nodes',
                'Market': 'ON_DEMAND',
                'InstanceRole': 'MASTER',
                'InstanceType': 'm3.xlarge',
                'InstanceCount': 1,
                , 
                'Name': 'Slave nodes',
                'Market': 'SPOT',
                'InstanceRole': 'CORE',
                'InstanceType': 'm3.xlarge',
                'InstanceCount': 2,
                ],
            'KeepJobFlowAliveWhenNoSteps': True,
            'Ec2KeyName': 'keys-kvp',
            'Ec2SubnetId': 'subnet-dsb65490',
            'EmrManagedMasterSecurityGroup': 'sg-0daa54d041d1033',
            'EmrManagedSlaveSecurityGroup': 'sg-0daa54d041d1033',
            ,
            Configurations=[
            "Classification":"spark-env",
            "Properties":,
            "Configurations":[
                "Classification":"export",
                "Properties":
                    "PYSPARK_PYTHON":"python36",
                    "PYSPARK_DRIVER_PYTHON":"python36"
                
            ]
            ],
            Steps=[
            'Name': 'mystep',
            'ActionOnFailure': 'TERMINATE_CLUSTER',
            'HadoopJarStep': 
                'Jar': 's3a://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar',
                'Args': [
                    '/home/hadoop/spark/bin/spark-submit', '--deploy-mode', 'cluster', '--master', 'yarn', 's3a://inscape-script/wordcount.py',
                ]
            
        ]
        )

    return 'Started cluster '.format(cluster_id)

我在创建集群时缺少什么?提前致谢。

【问题讨论】:

【参考方案1】:

您可以尝试将您的“Jar”参数更改为此,

'Jar': 's3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar',

https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hadoop-script.html

您也可以通过将“Jar”参数更改为来尝试使用命令运行程序

/var/lib/aws/emr/step-runner/hadoop-jars/command-runner.jar

【讨论】:

以上是关于EMR 集群创建在步骤中失败的主要内容,如果未能解决你的问题,请参考以下文章

创建 EMR 集群时出错,EMR 服务角色无效

寻找有关如何使用 python 启动 AWS EMR 集群以运行 pyspark 步骤的示例

集群终止但在本地工作

我在 AWS 中有一个现有的 EMR 集群。我想从气流运行 dag 到现有的 aws 集群

Apache Manged Airflow EMR 操作员 DAG 失败

如何设置支持 Impala 的 EMR 集群?