Amazon ECS 未扩展实例
Posted
技术标签:
【中文标题】Amazon ECS 未扩展实例【英文标题】:Amazon ECS not scaling out instances 【发布时间】:2019-02-25 12:41:49 【问题描述】:谁能解释一下为什么我的 ECS 堆栈没有扩展新的 EC2 实例?
我使用 Cloudformation 配置了我的 ECS 堆栈。初始配置运行良好。 一旦我启动我的堆栈,就会触发一个保持 CPU 负载 > 90% 的进程,以便出于测试目的触发横向扩展警报。
我设置了一个扩展警报,当 CPU > 15% 时触发扩展策略,当 CPU
然后日志消息会报告以下内容:
消息:服务 ECSService-12BBO1EE3SRUF 无法放置任务 因为没有容器实例满足其所有要求。最近的 匹配的容器实例 149e8eea-a8bc-433f-abbb-9a49c3a3c5b5 有 可用内存不足。有关详细信息,请参阅 故障排除部分。消息:成功将所需计数设置为 2。 等待 ecs 完成更改。原因:监控报警CPU 在状态 ALARM 触发策略中的利用率大于 5% ServiceScaleOutPolicy 155194fc-ee07-46ff-a822-018bd704602b
看起来 ECS 正在尝试将更多任务放在同一个实例上,而不是扩展实例数量并将新任务放在新实例上。如何让 ECS 横向扩展至新实例并在新实例上放置新任务?
我的 cloudformation 扩展配置如下所示:
ECSAutoScalingGroup:
Type: AWS::AutoScaling::AutoScalingGroup
DependsOn: ECSALB
Properties:
VPCZoneIdentifier: !Ref 'SubnetId'
LaunchConfigurationName: !Ref 'ContainerInstances'
MinSize: !Ref 'DesiredCapacity'
MaxSize: !Ref 'MaxSize'
DesiredCapacity: !Ref 'DesiredCapacity'
HealthCheckGracePeriod: 320
CreationPolicy:
ResourceSignal:
Timeout: PT15M
UpdatePolicy:
AutoScalingReplacingUpdate:
WillReplace: 'true'
AutoScalingRollingUpdate:
MinInstancesInService: '1'
MaxBatchSize: '1'
PauseTime: PT15M
WaitOnResourceSignals: 'true'
ServiceScalingTarget:
Type: AWS::ApplicationAutoScaling::ScalableTarget
DependsOn: ECSService
Properties:
MaxCapacity: 3
MinCapacity: 1
ResourceId: !Join ['', [service/, !Ref 'ECSCluster', /, !GetAtt [ECSService, Name]]]
RoleARN: !GetAtt [AutoscalingRole, Arn]
ScalableDimension: ecs:service:DesiredCount
ServiceNamespace: ecs
ServiceScaleOutPolicy:
Type : "AWS::ApplicationAutoScaling::ScalingPolicy"
Properties:
PolicyName: ServiceScaleOutPolicy
PolicyType: StepScaling
ScalingTargetId: !Ref 'ServiceScalingTarget'
StepScalingPolicyConfiguration:
AdjustmentType: ChangeInCapacity
Cooldown: 60
MetricAggregationType: Average
StepAdjustments:
- MetricIntervalLowerBound: 0
ScalingAdjustment: 1
ServiceScaleInPolicy:
Type : "AWS::ApplicationAutoScaling::ScalingPolicy"
Properties:
PolicyName: ServiceScaleInPolicy
PolicyType: StepScaling
ScalingTargetId: !Ref 'ServiceScalingTarget'
StepScalingPolicyConfiguration:
AdjustmentType: ChangeInCapacity
Cooldown: 60
MetricAggregationType: Average
StepAdjustments:
- MetricIntervalUpperBound: 0
ScalingAdjustment: -1
CPUScaleOutAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmName: CPU utilization greater than 15%
AlarmDescription: Alarm if cpu utilization greater than 15% of reserved cpu
Namespace: AWS/ECS
MetricName: CPUUtilization
Dimensions:
- Name: ClusterName
Value: !Ref ECSCluster
- Name: ServiceName
Value: !GetAtt ECSService.Name
Statistic: Maximum
Period: '60'
EvaluationPeriods: '1'
Threshold: '15'
ComparisonOperator: GreaterThanThreshold
AlarmActions:
- !Ref ServiceScaleOutPolicy
CPUScaleInAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmName: CPU utilization less than 4%
AlarmDescription: Alarm if cpu utilization greater than 4% of reserved cpu
Namespace: AWS/ECS
MetricName: CPUUtilization
Dimensions:
- Name: ClusterName
Value: !Ref ECSCluster
- Name: ServiceName
Value: !GetAtt ECSService.Name
Statistic: Maximum
Period: '60'
EvaluationPeriods: '4'
Threshold: '4'
ComparisonOperator: LessThanThreshold
AlarmActions:
- !Ref ServiceScaleInPolicy
AutoscalingRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Statement:
- Effect: Allow
Principal:
Service: [application-autoscaling.amazonaws.com]
Action: ['sts:AssumeRole']
Path: /
Policies:
- PolicyName: service-autoscaling
PolicyDocument:
Statement:
- Effect: Allow
Action: ['application-autoscaling:*', 'cloudwatch:DescribeAlarms', 'cloudwatch:PutMetricAlarm',
'ecs:DescribeServices', 'ecs:UpdateService']
Resource: '*'
TaskDefinition:
Type: AWS::ECS::TaskDefinition
Properties:
Family: !Join ['', [!Ref 'AWS::StackName', -frontend-task]]
ContainerDefinitions:
- Name: nginx-container
Image: nginx:latest
Cpu: '64'
Memory: '150'
Essential: 'true'
Links:
- "kestrel-container"
MountPoints:
- SourceVolume: "volume-nginx-conf"
ContainerPath: "/etc/nginx/conf.d/default.conf"
LogConfiguration:
LogDriver: awslogs
Options:
awslogs-group: !Ref 'CloudwatchLogsGroup'
awslogs-region: !Ref 'AWS::Region'
awslogs-stream-prefix: task-nginx-container
PortMappings:
- ContainerPort: 80
- ContainerPort: 443
- Name: kestrel-container
Image: some-image
Cpu: '940'
Memory: '512'
Essential: 'false'
LogConfiguration:
LogDriver: awslogs
Options:
awslogs-group: !Ref 'CloudwatchLogsGroup'
awslogs-region: !Ref 'AWS::Region'
awslogs-stream-prefix: task-kestrel-container
PortMappings:
- ContainerPort: 5443
Volumes:
- Host:
SourcePath: "/docker-volumes/nginx/nginx.conf"
Name: "volume-nginx-conf"
【问题讨论】:
【参考方案1】:您好像对服务自动伸缩和集群自动伸缩有误解。您在上面所做的是根据服务在其自己的容器中的 CPU 使用情况自动扩展服务。
如果集群的整体内存使用量达到一个阈值,您要做的是通过添加新的 EC2 实例来自动扩展 ECS 集群。
如果内存达到 80%,请在下面找到如何在集群级别配置自动缩放的 sn-p。我无法共享整个 cloudformation。
ECSInstanceAutoScalingGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
VPCZoneIdentifier:
- 'Fn::ImportValue':
!Sub '$VPCStackName-SubnetPrivateA'
- 'Fn::ImportValue':
!Sub '$VPCStackName-SubnetPrivateB'
- 'Fn::ImportValue':
!Sub '$VPCStackName-SubnetPrivateC'
LaunchConfigurationName: !Ref 'ECSInstanceLaunchConfiguration'
MinSize: !Ref 'ECSInstanceCount'
MaxSize: 6
DesiredCapacity: !Ref 'ECSInstanceCount'
MetricsCollection:
- Granularity: 1Minute
ECSInstanceLaunchConfiguration:
Type: AWS::AutoScaling::LaunchConfiguration
Metadata:
AWS::CloudFormation::Init:
configSets:
ConfigCluster:
- Install
Install:
files:
/home/ec2-user/.aws/config:
content: !Sub |
[default]
region = $AWS::Region
mode: '000755'
owner: ec2-user
group: root
/etc/ecs/ecs.config:
content: !Sub |
ECS_CLUSTER=$ECSCluster
ECS_ENABLE_CONTAINER_METADATA=true
ECS_ENGINE_TASK_CLEANUP_WAIT_DURATION=20m
ECS_DISABLE_IMAGE_CLEANUP=false
ECS_IMAGE_CLEANUP_INTERVAL=10m
ECS_IMAGE_MINIMUM_CLEANUP_AGE=20m
mode: '000755'
owner: root
group: root
Properties:
ImageId: !Ref ECSAMI
InstanceType: !Ref 'ECSInstanceType'
AssociatePublicIpAddress: 'false'
IamInstanceProfile: !Ref ECSClusterRoleInstance
SecurityGroups:
- !Ref 'ECSInstanceSecurityGroup'
ECSScalingPolicy:
Type: 'AWS::AutoScaling::ScalingPolicy'
Properties:
AutoScalingGroupName: !Ref ECSInstanceAutoScalingGroup
PolicyType: TargetTrackingScaling
TargetTrackingConfiguration:
CustomizedMetricSpecification:
MetricName: MemoryReservation
Namespace: "AWS/ECS"
Dimensions:
- Name: ClusterName
Value: !Sub "ecs-$EnvName-$EnvNumber"
Statistic: Maximum
Unit: Percent
TargetValue: 80
DisableScaleIn: false
【讨论】:
以上是关于Amazon ECS 未扩展实例的主要内容,如果未能解决你的问题,请参考以下文章