在 EKS 上,我如何验证我通过 Terraform 配置了 Spot 实例

Posted

技术标签:

【中文标题】在 EKS 上,我如何验证我通过 Terraform 配置了 Spot 实例【英文标题】:On EKS how do I verify I configured a Spot Instance through Terraform 【发布时间】:2021-03-19 21:33:40 【问题描述】:

我遵循了以下文档: https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/docs/spot-instances.md

并成功配置了一个 EKS 集群。

我让 kubectl 描述节点并得到:

➜  ~ kubectl describe node ip-10-0-1-205.us-east-2.compute.internal
Name:               ip-10-0-1-205.us-east-2.compute.internal
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=t2.medium
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=us-east-2
failure-domain.beta.kubernetes.io/zone=us-east-2a
kubernetes.io/arch=amd64
kubernetes.io/hostname=ip-10-0-1-205.us-east-2.compute.internal
kubernetes.io/os=linux
node.kubernetes.io/instance-type=t2.medium
node.kubernetes.io/lifecycle=spot
prefer=bot
topology.kubernetes.io/region=us-east-2
topology.kubernetes.io/zone=us-east-2a
Annotations:        node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Sun, 06 Dec 2020 18:09:26 +0200
Taints:             <none>
Unschedulable:      false
Lease:
HolderIdentity:  ip-10-0-1-205.us-east-2.compute.internal
AcquireTime:     <unset>
RenewTime:       Sun, 06 Dec 2020 21:03:06 +0200
Conditions:
Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
MemoryPressure   False   Sun, 06 Dec 2020 20:59:07 +0200   Sun, 06 Dec 2020 18:09:25 +0200   KubeletHasSufficientMemory   kubelet has sufficient memory available
DiskPressure     False   Sun, 06 Dec 2020 20:59:07 +0200   Sun, 06 Dec 2020 18:09:25 +0200   KubeletHasNoDiskPressure     kubelet has no disk pressure
PIDPressure      False   Sun, 06 Dec 2020 20:59:07 +0200   Sun, 06 Dec 2020 18:09:25 +0200   KubeletHasSufficientPID      kubelet has sufficient PID available
Ready            True    Sun, 06 Dec 2020 20:59:07 +0200   Sun, 06 Dec 2020 18:09:56 +0200   KubeletReady                 kubelet is posting ready status
Addresses:
InternalIP:   10.0.1.205
Hostname:     ip-10-0-1-205.us-east-2.compute.internal
InternalDNS:  ip-10-0-1-205.us-east-2.compute.internal
Capacity:
attachable-volumes-aws-ebs:  39
cpu:                         2
ephemeral-storage:           104845292Ki
hugepages-2Mi:               0
memory:                      4037584Ki
pods:                        17
Allocatable:
attachable-volumes-aws-ebs:  39
cpu:                         1930m
ephemeral-storage:           95551679124
hugepages-2Mi:               0
memory:                      3482576Ki
pods:                        17
System Info:
Machine ID:                 4283642d849e48e7ac935e6a6574599a
System UUID:                EC22BB5A-0463-5D55-ECDD-49865E6294F9
Boot ID:                    b568afc1-96f2-4669-895e-b3586b7758df
Kernel Version:             4.14.203-156.332.amzn2.x86_64
OS Image:                   Amazon Linux 2
Operating System:           linux
Architecture:               amd64
Container Runtime Version:  docker://19.3.6
Kubelet Version:            v1.18.9-eks-d1db3c
Kube-Proxy Version:         v1.18.9-eks-d1db3c
ProviderID:                   aws:///us-east-2a/i-045333340f54ac375
Non-terminated Pods:          (5 in total)
Namespace                   Name                                                    CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
  ---------                   ----                                                    ------------  ----------  ---------------  -------------  ---
kube-system                 aws-node-9bp6w                                          10m (0%)      0 (0%)      0 (0%)           0 (0%)         173m
kube-system                 kube-proxy-52l4m                                        100m (5%)     0 (0%)      0 (0%)           0 (0%)         173m
monitoring                  prometheus-kube-prometheus-operator-576f4bf45b-wgz5v    0 (0%)        0 (0%)      0 (0%)           0 (0%)         166m
monitoring                  prometheus-prometheus-node-exporter-tnh6h               0 (0%)        0 (0%)      0 (0%)           0 (0%)         166m
wielder-services            bot-b5f557cc-d7b74                                      1600m (82%)   0 (0%)      1600Mi (47%)     0 (0%)         155m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource                    Requests      Limits
  --------                    --------      ------
cpu                         1710m (88%)   0 (0%)
memory                      1600Mi (47%)  0 (0%)
ephemeral-storage           0 (0%)        0 (0%)
hugepages-2Mi               0 (0%)        0 (0%)
attachable-volumes-aws-ebs  0             0
Events:                       <none>

在此描述中,我可以检测到的节点是 Spot 实例的唯一指示是我创建的标签 node.kubernetes.io/lifecycle=spot

在 AWS 控制台中查看我找到的节点信息:

终止保护 已禁用

生命周期 正常

我如何确定我已配置 Spot 实例?

如果我还没有配置 Spot 实例,我该怎么做?

【问题讨论】:

Lifecycle Normal 表示按需。它会说Lifecycle Spot 用于现场。您需要在启动配置中设置出价,以便供应现货而不是按需供应 @jordanm 如何设定出价? 【参考方案1】:

按照@jordanm 的建议,我搜索了很多文档,最后在https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/spot_instance_request 找到了类似的东西 https://github.com/terraform-aws-modules/terraform-aws-eks/blob/master/docs/spot-instances.md

并在相关的 terraform 配置中找到此语法

spot_price = "1.10"

这是它在 EKS 模块中的外观:

module "eks" 
  source          = "terraform-aws-modules/eks/aws"
  cluster_name    = local.cluster_name
  cluster_version = var.kube_version
  subnets         = module.vpc.private_subnets
  vpc_id          = module.vpc.vpc_id
  enable_irsa     = true

  tags = 
    Environment = "training"
    GithubRepo  = "terraform-aws-eks"
    GithubOrg   = "terraform-aws-modules"
  

  worker_groups = [
    
    
      name                          = "worker-group-bot"
      instance_type                 = var.bot_instance
      disk_size                     = var.bot_disk_size
      additional_userdata           = "echo foo bar"
      additional_security_group_ids = [aws_security_group.worker_group_mgmt_two.id]
      asg_desired_capacity          = var.bot_desired
      asg_max_size                  = var.bot_max

      //      availability_zones = [var.availability_zone]
      //      subnets         = [module.vpc.private_subnets[0]]

      kubelet_extra_args      = "--node-labels=node.kubernetes.io/lifecycle=spot,prefer=bot"

      suspended_processes     = ["AZRebalance"]
      spot_price              = "1.10"

      tags = [
        
          "key"                 = "k8s.io/cluster-autoscaler/enabled"
          "propagate_at_launch" = "false"
          "value"               = "true"
        ,
        
          "key"                 = "k8s.io/cluster-autoscaler/$local.cluster_name"
          "propagate_at_launch" = "false"
          "value"               = "true"
        
      ]
    ,
  ]

  workers_additional_policies = ["arn:aws:iam::aws:policy/AutoScalingFullAccess"]



如果有人知道文档所在的位置,请发表评论。

【讨论】:

以上是关于在 EKS 上,我如何验证我通过 Terraform 配置了 Spot 实例的主要内容,如果未能解决你的问题,请参考以下文章

尝试使用 EKS 集群将 configmap 应用于身份验证时出错

如何向世界公开我的 nginx EKS 集群?

如何通过 AWS EKS 在 ECR 中使用 Docker 映像

如何从 EKS 集群为 Django 提供静态文件?

使用 Terraform 将 `configMap` 应用到 EKS 集群

SSH 隧道到 EKS 上的 FushionAuth