Amazon ECS 未扩展实例

Posted

技术标签:

【中文标题】Amazon ECS 未扩展实例【英文标题】:Amazon ECS not scaling out instances 【发布时间】:2019-02-25 12:41:49 【问题描述】:

谁能解释一下为什么我的 ECS 堆栈没有扩展新的 EC2 实例?

我使用 Cloudformation 配置了我的 ECS 堆栈。初始配置运行良好。 一旦我启动我的堆栈,就会触发一个保持 CPU 负载 > 90% 的进程,以便出于测试目的触发横向扩展警报。

我设置了一个扩展警报,当 CPU > 15% 时触发扩展策略,当 CPU

然后日志消息会报告以下内容:

消息:服务 ECSService-12BBO1EE3SRUF 无法放置任务 因为没有容器实例满足其所有要求。最近的 匹配的容器实例 149e8eea-a8bc-433f-abbb-9a49c3a3c5b5 有 可用内存不足。有关详细信息,请参阅 故障排除部分。消息:成功将所需计数设置为 2。 等待 ecs 完成更改。原因:监控报警CPU 在状态 ALARM 触发策略中的利用率大于 5% ServiceScaleOutPolicy 155194fc-ee07-46ff-a822-018bd704602b

看起来 ECS 正在尝试将更多任务放在同一个实例上,而不是扩展实例数量并将新任务放在新实例上。如何让 ECS 横向扩展至新实例并在新实例上放置新任务?

我的 cloudformation 扩展配置如下所示:

ECSAutoScalingGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    DependsOn: ECSALB
    Properties:
      VPCZoneIdentifier: !Ref 'SubnetId'
      LaunchConfigurationName: !Ref 'ContainerInstances'
      MinSize: !Ref 'DesiredCapacity'
      MaxSize: !Ref 'MaxSize'
      DesiredCapacity: !Ref 'DesiredCapacity'
      HealthCheckGracePeriod: 320
    CreationPolicy:
      ResourceSignal:
        Timeout: PT15M
    UpdatePolicy:
      AutoScalingReplacingUpdate:
        WillReplace: 'true'
      AutoScalingRollingUpdate:
        MinInstancesInService: '1'
        MaxBatchSize: '1'
        PauseTime: PT15M
        WaitOnResourceSignals: 'true'

ServiceScalingTarget:
    Type: AWS::ApplicationAutoScaling::ScalableTarget
    DependsOn: ECSService
    Properties:
      MaxCapacity: 3
      MinCapacity: 1
      ResourceId: !Join ['', [service/, !Ref 'ECSCluster', /, !GetAtt [ECSService, Name]]]
      RoleARN: !GetAtt [AutoscalingRole, Arn]
      ScalableDimension: ecs:service:DesiredCount
      ServiceNamespace: ecs

ServiceScaleOutPolicy:
    Type : "AWS::ApplicationAutoScaling::ScalingPolicy"
    Properties:
      PolicyName: ServiceScaleOutPolicy
      PolicyType: StepScaling
      ScalingTargetId: !Ref 'ServiceScalingTarget'
      StepScalingPolicyConfiguration:
          AdjustmentType: ChangeInCapacity
          Cooldown: 60
          MetricAggregationType: Average
          StepAdjustments:
          - MetricIntervalLowerBound: 0
            ScalingAdjustment: 1

  ServiceScaleInPolicy:
    Type : "AWS::ApplicationAutoScaling::ScalingPolicy"
    Properties:
      PolicyName: ServiceScaleInPolicy
      PolicyType: StepScaling
      ScalingTargetId: !Ref 'ServiceScalingTarget'
      StepScalingPolicyConfiguration:
          AdjustmentType: ChangeInCapacity
          Cooldown: 60
          MetricAggregationType: Average
          StepAdjustments:
          - MetricIntervalUpperBound: 0
            ScalingAdjustment: -1

  CPUScaleOutAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: CPU utilization greater than 15%
      AlarmDescription: Alarm if cpu utilization greater than 15% of reserved cpu
      Namespace: AWS/ECS
      MetricName: CPUUtilization
      Dimensions:
      - Name: ClusterName
        Value: !Ref ECSCluster
      - Name: ServiceName
        Value: !GetAtt ECSService.Name
      Statistic: Maximum
      Period: '60'
      EvaluationPeriods: '1'
      Threshold: '15'
      ComparisonOperator: GreaterThanThreshold
      AlarmActions:
      - !Ref ServiceScaleOutPolicy

  CPUScaleInAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      AlarmName: CPU utilization less than 4%
      AlarmDescription: Alarm if cpu utilization greater than 4% of reserved cpu
      Namespace: AWS/ECS
      MetricName: CPUUtilization
      Dimensions:
      - Name: ClusterName
        Value: !Ref ECSCluster
      - Name: ServiceName
        Value: !GetAtt ECSService.Name
      Statistic: Maximum
      Period: '60'
      EvaluationPeriods: '4'
      Threshold: '4'
      ComparisonOperator: LessThanThreshold
      AlarmActions:
        - !Ref ServiceScaleInPolicy
AutoscalingRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Statement:
        - Effect: Allow
          Principal:
            Service: [application-autoscaling.amazonaws.com]
          Action: ['sts:AssumeRole']
      Path: /
      Policies:
      - PolicyName: service-autoscaling
        PolicyDocument:
          Statement:
          - Effect: Allow
            Action: ['application-autoscaling:*', 'cloudwatch:DescribeAlarms', 'cloudwatch:PutMetricAlarm',
              'ecs:DescribeServices', 'ecs:UpdateService']
            Resource: '*'

TaskDefinition:
    Type: AWS::ECS::TaskDefinition
    Properties:
      Family: !Join ['', [!Ref 'AWS::StackName', -frontend-task]]
      ContainerDefinitions:
        - Name: nginx-container
          Image: nginx:latest
          Cpu: '64'
          Memory: '150'
          Essential: 'true'
          Links:
            - "kestrel-container"
          MountPoints: 
            - SourceVolume: "volume-nginx-conf"
              ContainerPath: "/etc/nginx/conf.d/default.conf"
          LogConfiguration:
            LogDriver: awslogs
            Options:
              awslogs-group: !Ref 'CloudwatchLogsGroup'
              awslogs-region: !Ref 'AWS::Region'
              awslogs-stream-prefix: task-nginx-container
          PortMappings:
          - ContainerPort: 80
          - ContainerPort: 443

        - Name: kestrel-container
          Image: some-image
          Cpu: '940'
          Memory: '512'
          Essential: 'false'
          LogConfiguration:
            LogDriver: awslogs
            Options:
              awslogs-group: !Ref 'CloudwatchLogsGroup'
              awslogs-region: !Ref 'AWS::Region'
              awslogs-stream-prefix: task-kestrel-container
          PortMappings:
          - ContainerPort: 5443

      Volumes:
          - Host: 
              SourcePath: "/docker-volumes/nginx/nginx.conf"
            Name: "volume-nginx-conf"

【问题讨论】:

【参考方案1】:

您好像对服务自动伸缩和集群自动伸缩有误解。您在上面所做的是根据服务在其自己的容器中的 CPU 使用情况自动扩展服务。

如果集群的整体内存使用量达到一个阈值,您要做的是通过添加新的 EC2 实例来自动扩展 ECS 集群。

如果内存达到 80%,请在下面找到如何在集群级别配置自动缩放的 sn-p。我无法共享整个 cloudformation。

ECSInstanceAutoScalingGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      VPCZoneIdentifier:
      - 'Fn::ImportValue':
          !Sub '$VPCStackName-SubnetPrivateA'
      - 'Fn::ImportValue':
          !Sub '$VPCStackName-SubnetPrivateB'
      - 'Fn::ImportValue':
          !Sub '$VPCStackName-SubnetPrivateC'
      LaunchConfigurationName: !Ref 'ECSInstanceLaunchConfiguration'
      MinSize: !Ref 'ECSInstanceCount'
      MaxSize: 6
      DesiredCapacity: !Ref 'ECSInstanceCount'
      MetricsCollection:
        - Granularity: 1Minute

ECSInstanceLaunchConfiguration:
  Type: AWS::AutoScaling::LaunchConfiguration
  Metadata:
    AWS::CloudFormation::Init:
      configSets:
        ConfigCluster:
        - Install
      Install:
        files:
          /home/ec2-user/.aws/config:
            content: !Sub |
              [default]
              region = $AWS::Region
            mode: '000755'
            owner: ec2-user
            group: root
          /etc/ecs/ecs.config:
            content: !Sub |
              ECS_CLUSTER=$ECSCluster
              ECS_ENABLE_CONTAINER_METADATA=true
              ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION=20m
              ECS_DISABLE_IMAGE_CLEANUP=false
              ECS_IMAGE_CLEANUP_INTERVAL=10m
              ECS_IMAGE_MINIMUM_CLEANUP_AGE=20m
            mode: '000755'
            owner: root
            group: root
  Properties:
    ImageId: !Ref ECSAMI
    InstanceType: !Ref 'ECSInstanceType'
    AssociatePublicIpAddress: 'false'
    IamInstanceProfile: !Ref ECSClusterRoleInstance
    SecurityGroups:
    - !Ref 'ECSInstanceSecurityGroup'

ECSScalingPolicy:
  Type: 'AWS::AutoScaling::ScalingPolicy'
  Properties:
    AutoScalingGroupName: !Ref ECSInstanceAutoScalingGroup
    PolicyType: TargetTrackingScaling
    TargetTrackingConfiguration:
      CustomizedMetricSpecification:
        MetricName: MemoryReservation
        Namespace: "AWS/ECS"
        Dimensions:
          - Name: ClusterName
            Value: !Sub "ecs-$EnvName-$EnvNumber"
        Statistic: Maximum
        Unit: Percent
      TargetValue: 80
      DisableScaleIn: false

【讨论】:

以上是关于Amazon ECS 未扩展实例的主要内容,如果未能解决你的问题,请参考以下文章

自动扩展 ECS 集群到/从零个实例

AWS扩展在EC2容器服务的终端保护中

如何水平扩展 Amazon RDS 实例?

Amazon AWS ECS Docker 端口未正确绑定

ubuntu 上的 Amazon ECS 代理未启动

AWS IAM 策略拒绝对自动扩展组或 ECS 集群内的任何 EC2 实例的权限