在 EKS 上,我如何验证我通过 Terraform 配置了 Spot 实例
Posted
技术标签:
【中文标题】在 EKS 上,我如何验证我通过 Terraform 配置了 Spot 实例【英文标题】:On EKS how do I verify I configured a Spot Instance through Terraform 【发布时间】:2021-03-19 21:33:40 【问题描述】:我遵循了以下文档: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/docs/spot-instances.md
并成功配置了一个 EKS 集群。
我让 kubectl 描述节点并得到:
➜ ~ kubectl describe node ip-10-0-1-205.us-east-2.compute.internal
Name: ip-10-0-1-205.us-east-2.compute.internal
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=t2.medium
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=us-east-2
failure-domain.beta.kubernetes.io/zone=us-east-2a
kubernetes.io/arch=amd64
kubernetes.io/hostname=ip-10-0-1-205.us-east-2.compute.internal
kubernetes.io/os=linux
node.kubernetes.io/instance-type=t2.medium
node.kubernetes.io/lifecycle=spot
prefer=bot
topology.kubernetes.io/region=us-east-2
topology.kubernetes.io/zone=us-east-2a
Annotations: node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sun, 06 Dec 2020 18:09:26 +0200
Taints: <none>
Unschedulable: false
Lease:
HolderIdentity: ip-10-0-1-205.us-east-2.compute.internal
AcquireTime: <unset>
RenewTime: Sun, 06 Dec 2020 21:03:06 +0200
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Sun, 06 Dec 2020 20:59:07 +0200 Sun, 06 Dec 2020 18:09:25 +0200 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Sun, 06 Dec 2020 20:59:07 +0200 Sun, 06 Dec 2020 18:09:25 +0200 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Sun, 06 Dec 2020 20:59:07 +0200 Sun, 06 Dec 2020 18:09:25 +0200 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Sun, 06 Dec 2020 20:59:07 +0200 Sun, 06 Dec 2020 18:09:56 +0200 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 10.0.1.205
Hostname: ip-10-0-1-205.us-east-2.compute.internal
InternalDNS: ip-10-0-1-205.us-east-2.compute.internal
Capacity:
attachable-volumes-aws-ebs: 39
cpu: 2
ephemeral-storage: 104845292Ki
hugepages-2Mi: 0
memory: 4037584Ki
pods: 17
Allocatable:
attachable-volumes-aws-ebs: 39
cpu: 1930m
ephemeral-storage: 95551679124
hugepages-2Mi: 0
memory: 3482576Ki
pods: 17
System Info:
Machine ID: 4283642d849e48e7ac935e6a6574599a
System UUID: EC22BB5A-0463-5D55-ECDD-49865E6294F9
Boot ID: b568afc1-96f2-4669-895e-b3586b7758df
Kernel Version: 4.14.203-156.332.amzn2.x86_64
OS Image: Amazon Linux 2
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://19.3.6
Kubelet Version: v1.18.9-eks-d1db3c
Kube-Proxy Version: v1.18.9-eks-d1db3c
ProviderID: aws:///us-east-2a/i-045333340f54ac375
Non-terminated Pods: (5 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system aws-node-9bp6w 10m (0%) 0 (0%) 0 (0%) 0 (0%) 173m
kube-system kube-proxy-52l4m 100m (5%) 0 (0%) 0 (0%) 0 (0%) 173m
monitoring prometheus-kube-prometheus-operator-576f4bf45b-wgz5v 0 (0%) 0 (0%) 0 (0%) 0 (0%) 166m
monitoring prometheus-prometheus-node-exporter-tnh6h 0 (0%) 0 (0%) 0 (0%) 0 (0%) 166m
wielder-services bot-b5f557cc-d7b74 1600m (82%) 0 (0%) 1600Mi (47%) 0 (0%) 155m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1710m (88%) 0 (0%)
memory 1600Mi (47%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
attachable-volumes-aws-ebs 0 0
Events: <none>
在此描述中,我可以检测到的节点是 Spot 实例的唯一指示是我创建的标签 node.kubernetes.io/lifecycle=spot
在 AWS 控制台中查看我找到的节点信息:
终止保护 已禁用
生命周期 正常
我如何确定我已配置 Spot 实例?
如果我还没有配置 Spot 实例,我该怎么做?
【问题讨论】:
Lifecycle Normal
表示按需。它会说Lifecycle Spot
用于现场。您需要在启动配置中设置出价,以便供应现货而不是按需供应
@jordanm 如何设定出价?
【参考方案1】:
按照@jordanm 的建议,我搜索了很多文档,最后在https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/spot_instance_request 找到了类似的东西 https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/docs/spot-instances.md
并在相关的 terraform 配置中找到此语法
spot_price = "1.10"
这是它在 EKS 模块中的外观:
module "eks"
source = "terraform-aws-modules/eks/aws"
cluster_name = local.cluster_name
cluster_version = var.kube_version
subnets = module.vpc.private_subnets
vpc_id = module.vpc.vpc_id
enable_irsa = true
tags =
Environment = "training"
GithubRepo = "terraform-aws-eks"
GithubOrg = "terraform-aws-modules"
worker_groups = [
name = "worker-group-bot"
instance_type = var.bot_instance
disk_size = var.bot_disk_size
additional_userdata = "echo foo bar"
additional_security_group_ids = [aws_security_group.worker_group_mgmt_two.id]
asg_desired_capacity = var.bot_desired
asg_max_size = var.bot_max
// availability_zones = [var.availability_zone]
// subnets = [module.vpc.private_subnets[0]]
kubelet_extra_args = "--node-labels=node.kubernetes.io/lifecycle=spot,prefer=bot"
suspended_processes = ["AZRebalance"]
spot_price = "1.10"
tags = [
"key" = "k8s.io/cluster-autoscaler/enabled"
"propagate_at_launch" = "false"
"value" = "true"
,
"key" = "k8s.io/cluster-autoscaler/$local.cluster_name"
"propagate_at_launch" = "false"
"value" = "true"
]
,
]
workers_additional_policies = ["arn:aws:iam::aws:policy/AutoScalingFullAccess"]
如果有人知道文档所在的位置,请发表评论。
【讨论】:
以上是关于在 EKS 上,我如何验证我通过 Terraform 配置了 Spot 实例的主要内容,如果未能解决你的问题,请参考以下文章
尝试使用 EKS 集群将 configmap 应用于身份验证时出错
如何通过 AWS EKS 在 ECR 中使用 Docker 映像