如何使用java代码设置amazon ami的hadoop配置

Posted

技术标签:

【中文标题】如何使用java代码设置amazon ami的hadoop配置【英文标题】:How to set amazon ami's hadoop configuration by using java code 【发布时间】:2013-05-31 09:52:41 【问题描述】:

我想将此配置textinputformat.record.delimiter=; 设置为hadoop。

现在我使用以下代码在 ami 上运行 pig 脚本。有人知道如何使用以下代码设置此配置吗?

代码:

StepConfig installPig = new StepConfig()
.withName("Install Pig")
.withActionOnFailure(ActionOnFailure.TERMINATE_JOB_FLOW.name())
.withHadoopJarStep(stepFactory.newInstallPigStep());

// [Configure pig script][1]

String[] scriptArgs = new String[]  "-p", input, "-p", output ;
StepConfig runPigLatinScript = new StepConfig()
.withName("Run Pig Script")             .withActionOnFailure(ActionOnFailure.CANCEL_AND_WAIT.name())
.withHadoopJarStep(stepFactory.newRunPigScriptStep("s3://pig/script.pig", scriptArgs));

// Configure JobFlow [R1][2], [R3][3]
//
//

RunJobFlowRequest request = new RunJobFlowRequest()
.withName(jobFlowName)
.withSteps(installPig, runPigLatinScript)
.withLogUri(logUri)
.withAmiVersion("2.3.2")
.withInstances(new JobFlowInstancesConfig()
            .withEc2KeyName(this.ec2KeyName)
            .withInstanceCount(this.count)
            .withKeepJobFlowAliveWhenNoSteps(false)
            .withMasterInstanceType(this.masterType)
            .withSlaveInstanceType(this.slaveType));
// Run JobFlow
RunJobFlowResult runJobFlowResult = this.amazonEmrClient.runJobFlow(request);

【问题讨论】:

【参考方案1】:

您需要做的是创建BootstrapActionConfig 并将其添加到正在创建的RunJobFlowRequest,然后将自定义hadoop 配置添加到集群。

这是我在编辑代码here后为你写的完整代码:

import java.util.ArrayList;
import java.util.List;

import com.amazonaws.auth.AWSCredentials;
import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.services.elasticmapreduce.AmazonElasticMapReduceClient;
import com.amazonaws.services.elasticmapreduce.model.BootstrapActionConfig;
import com.amazonaws.services.elasticmapreduce.model.JobFlowInstancesConfig;
import com.amazonaws.services.elasticmapreduce.model.RunJobFlowRequest;
import com.amazonaws.services.elasticmapreduce.model.RunJobFlowResult;
import com.amazonaws.services.elasticmapreduce.model.ScriptBootstrapActionConfig;
import com.amazonaws.services.elasticmapreduce.model.StepConfig;
import com.amazonaws.services.elasticmapreduce.util.StepFactory;

/**
 * 
 * @author amar
 * 
 */
public class RunEMRJobFlow 

    private static final String CONFIG_HADOOP_BOOTSTRAP_ACTION = "s3://elasticmapreduce/bootstrap-actions/configure-hadoop";

    public static void main(String[] args) 

        String accessKey = "";
        String secretKey = "";
        AWSCredentials credentials = new BasicAWSCredentials(accessKey, secretKey);
        AmazonElasticMapReduceClient emr = new AmazonElasticMapReduceClient(credentials);

        StepFactory stepFactory = new StepFactory();

        StepConfig enabledebugging = new StepConfig().withName("Enable debugging")
                .withActionOnFailure("TERMINATE_JOB_FLOW").withHadoopJarStep(stepFactory.newEnableDebuggingStep());

        StepConfig installHive = new StepConfig().withName("Install Hive").withActionOnFailure("TERMINATE_JOB_FLOW")
                .withHadoopJarStep(stepFactory.newInstallHiveStep());
        List<String> setMappersArgs = new ArrayList<String>();
        setMappersArgs.add("-s");
        setMappersArgs.add("textinputformat.record.delimiter=;");

        BootstrapActionConfig mappersBootstrapConfig = createBootstrapAction("Set Hadoop Config",
                CONFIG_HADOOP_BOOTSTRAP_ACTION, setMappersArgs);

        RunJobFlowRequest request = new RunJobFlowRequest()
                .withBootstrapActions(mappersBootstrapConfig)
                .withName("Hive Interactive")
                .withSteps(enabledebugging, installHive)
                .withLogUri("s3://myawsbucket/")
                .withInstances(
                        new JobFlowInstancesConfig().withEc2KeyName("keypair").withHadoopVersion("0.20")
                                .withInstanceCount(5).withKeepJobFlowAliveWhenNoSteps(true)
                                .withMasterInstanceType("m1.small").withSlaveInstanceType("m1.small"));

        RunJobFlowResult result = emr.runJobFlow(request);
    

    private static BootstrapActionConfig createBootstrapAction(String bootstrapName, String bootstrapPath,
            List<String> args) 

        ScriptBootstrapActionConfig bootstrapScriptConfig = new ScriptBootstrapActionConfig();
        bootstrapScriptConfig.setPath(bootstrapPath);

        if (args != null) 
            bootstrapScriptConfig.setArgs(args);
        

        BootstrapActionConfig bootstrapConfig = new BootstrapActionConfig();
        bootstrapConfig.setName(bootstrapName);
        bootstrapConfig.setScriptBootstrapAction(bootstrapScriptConfig);

        return bootstrapConfig;
    


【讨论】:

以上是关于如何使用java代码设置amazon ami的hadoop配置的主要内容,如果未能解决你的问题,请参考以下文章

在 java 中使用 amazon sdk 创建 amazon ec2 windows AMI

如何检查 Amazon Linux 机器 (AMI) 上的 Tomcat 版本?

如何在最新的 Amazon linux AMI 中升级 docker

Amazon AMI 和 EBS 快照有啥区别

Amazon EC2、Auto-Scaling、AMI,有没有更简单的方法?

Procedure for installing and setting Sun JDK Java on Default Amazon Linux AMI