EMR 集群:必须指定 MasterInstanceFleet 和 MasterInstanceGroup 之一
Posted
技术标签:
【中文标题】EMR 集群:必须指定 MasterInstanceFleet 和 MasterInstanceGroup 之一【英文标题】:EMR Cluster: Exactly one of MasterInstanceFleet and MasterInstanceGroup must be specified 【发布时间】:2021-08-07 11:16:19 【问题描述】:我正在尝试通过 cdk 创建 EMR 集群,但即使在指定 master_instance_fleet 和 master_instance_group 后也出现错误 无法弄清楚是什么问题,请有人帮忙
下面是我的代码:
`on_demand = _emr.CfnCluster.OnDemandProvisioningSpecificationProperty(allocation_strategy="On-demand")
master_instance_fleet = _emr.CfnCluster.InstanceFleetConfigProperty(
instance_type_configs=[_emr.CfnCluster.InstanceTypeConfigProperty(
instance_type="m5.xlarge",
weighted_capacity=1,
ebs_configuration=_emr.CfnCluster.EbsConfigurationProperty(
ebs_block_device_configs=[
_emr.CfnCluster.EbsBlockDeviceConfigProperty(
volume_specification=_emr.CfnCluster.VolumeSpecificationProperty(
size_in_gb=64,
volume_type="EBS Storage"))]
)
)],
launch_specifications=_emr.CfnCluster.InstanceFleetProvisioningSpecificationsProperty(
on_demand_specification=on_demand),
name="MASTER",
target_on_demand_capacity=64,
)
master_instance_group = _emr.CfnCluster.InstanceGroupConfigProperty(
instance_count=1,
instance_type="m5.xlarge",
name="core",
ebs_configuration=_emr.CfnCluster.EbsConfigurationProperty(
ebs_block_device_configs=[_emr.CfnCluster.EbsBlockDeviceConfigProperty(
volume_specification=_emr.CfnCluster.VolumeSpecificationProperty(size_in_gb=64,
volume_type="EBS Storage"))]
)
)
instances = _emr.CfnCluster.JobFlowInstancesConfigProperty(
master_instance_fleet=master_instance_fleet,
master_instance_group=master_instance_group,
additional_master_security_groups=default_security_groups,
additional_slave_security_groups=default_security_groups,
core_instance_fleet=master_instance_fleet,
core_instance_group=master_instance_group,
ec2_subnet_id=default_vpc_subnets[1],
ec2_subnet_ids=default_vpc_subnets,
ec2_key_name="mykey",
hadoop_version="3.2.1",
)
application_properties = _emr.CfnCluster.ApplicationProperty(
name="Hadoop",
version="3.2.1",
)
_emr.CfnCluster(self,
id="myEMRCluster",
name="myEMRCluster",
instances=instances,
job_flow_role="EMR_DefaultRole",
service_role="EMR_EC2_DefaultRole",
auto_scaling_role="EMR_AutoScaling_DefaultRole",
visible_to_all_users=True,
ebs_root_volume_size=50,
applications=[application_properties]
)
`
【问题讨论】:
【参考方案1】:需要对上述代码稍作修改,根据 aws 文档,我们应该指定主实例队列或主实例组,如下所示
instances = _emr.CfnCluster.JobFlowInstancesConfigProperty(
master_instance_group=master_instance_group,
additional_master_security_groups=[emr_security_group.security_group_id],
additional_slave_security_groups=[emr_security_group.security_group_id],
core_instance_group=master_instance_group,
ec2_subnet_ids=default_vpc_subnets,
ec2_key_name=self.node.try_get_context(stage)["ec2_key_name"],
hadoop_version=self.node.try_get_context(stage)["hadoop_version"]
)
这将创建带有按需实例的集群
【讨论】:
以上是关于EMR 集群:必须指定 MasterInstanceFleet 和 MasterInstanceGroup 之一的主要内容,如果未能解决你的问题,请参考以下文章
如何将 s3 数据从一个 EMR 集群读取到另一个 EMR 集群?