EMR 集群:必须指定 MasterInstanceFleet 和 MasterInstanceGroup 之一

Posted

技术标签:

【中文标题】EMR 集群:必须指定 MasterInstanceFleet 和 MasterInstanceGroup 之一【英文标题】:EMR Cluster: Exactly one of MasterInstanceFleet and MasterInstanceGroup must be specified 【发布时间】:2021-08-07 11:16:19 【问题描述】:

我正在尝试通过 cdk 创建 EMR 集群,但即使在指定 master_instance_fleet 和 master_instance_group 后也出现错误 无法弄清楚是什么问题,请有人帮忙

下面是我的代码:

`on_demand = _emr.CfnCluster.OnDemandProvisioningSpecificationProperty(allocation_strategy="On-demand")

    master_instance_fleet = _emr.CfnCluster.InstanceFleetConfigProperty(
        instance_type_configs=[_emr.CfnCluster.InstanceTypeConfigProperty(
            instance_type="m5.xlarge",
            weighted_capacity=1,
            ebs_configuration=_emr.CfnCluster.EbsConfigurationProperty(
                ebs_block_device_configs=[
                    _emr.CfnCluster.EbsBlockDeviceConfigProperty(
                        volume_specification=_emr.CfnCluster.VolumeSpecificationProperty(
                            size_in_gb=64,
                            volume_type="EBS Storage"))]
            )
        )],
        launch_specifications=_emr.CfnCluster.InstanceFleetProvisioningSpecificationsProperty(
            on_demand_specification=on_demand),
        name="MASTER",
        target_on_demand_capacity=64,
    )

    master_instance_group = _emr.CfnCluster.InstanceGroupConfigProperty(
        instance_count=1,
        instance_type="m5.xlarge",
        name="core",
        ebs_configuration=_emr.CfnCluster.EbsConfigurationProperty(
            ebs_block_device_configs=[_emr.CfnCluster.EbsBlockDeviceConfigProperty(
                volume_specification=_emr.CfnCluster.VolumeSpecificationProperty(size_in_gb=64,
                                                                                 volume_type="EBS Storage"))]
        )
    )

    instances = _emr.CfnCluster.JobFlowInstancesConfigProperty(
        master_instance_fleet=master_instance_fleet,
        master_instance_group=master_instance_group,
        additional_master_security_groups=default_security_groups,
        additional_slave_security_groups=default_security_groups,
        core_instance_fleet=master_instance_fleet,
        core_instance_group=master_instance_group,
        ec2_subnet_id=default_vpc_subnets[1],
        ec2_subnet_ids=default_vpc_subnets,
        ec2_key_name="mykey",
        hadoop_version="3.2.1",
    )

    
    application_properties = _emr.CfnCluster.ApplicationProperty(
        name="Hadoop",
        version="3.2.1",
    )

    _emr.CfnCluster(self,
                    id="myEMRCluster",
                    name="myEMRCluster",
                    instances=instances,
                    job_flow_role="EMR_DefaultRole",
                    service_role="EMR_EC2_DefaultRole",
                    auto_scaling_role="EMR_AutoScaling_DefaultRole",
                    visible_to_all_users=True,
                    ebs_root_volume_size=50,
                    applications=[application_properties]
                    )

`

【问题讨论】:

【参考方案1】:

需要对上述代码稍作修改,根据 aws 文档,我们应该指定主实例队列或主实例组,如下所示

instances = _emr.CfnCluster.JobFlowInstancesConfigProperty(
                    master_instance_group=master_instance_group,
        additional_master_security_groups=[emr_security_group.security_group_id],
        additional_slave_security_groups=[emr_security_group.security_group_id],
         core_instance_group=master_instance_group,
        ec2_subnet_ids=default_vpc_subnets,
        ec2_key_name=self.node.try_get_context(stage)["ec2_key_name"],
        hadoop_version=self.node.try_get_context(stage)["hadoop_version"]
    )

这将创建带有按需实例的集群

【讨论】:

以上是关于EMR 集群:必须指定 MasterInstanceFleet 和 MasterInstanceGroup 之一的主要内容,如果未能解决你的问题,请参考以下文章

如何将 s3 数据从一个 EMR 集群读取到另一个 EMR 集群?

如何在多个子网上运行 AWS EMR 集群?

如何将 Spark EMR 集群与 AWS elasticsearch 集群连接起来

Amazon EMR 服务与 EMR 集群

使用 AWS EMR 的 ETL

创建 EMR 集群时出错,EMR 服务角色无效