在使用 terraform cloud [aws-provider] 启动 ec2 实例时，既不能执行 user_data 脚本，也不能使用连接块执行 remote-exec

Posted 2023-03-24

技术标签:

【中文标题】在使用 terraform cloud [aws-provider] 启动 ec2 实例时，既不能执行 user_data 脚本，也不能使用连接块执行 remote-exec【英文标题】：Can't execute neither user_data script, nor remote-exec with connection block while launching ec2 instance with terraform cloud [aws-provider] 【发布时间】：2021-11-22 14:39:14 【问题描述】：

我创建了一个带有网络 acl、安全组、子网等的 aws 基础架构 [代码附在底部]。在免费层。我还与我的 ec2 实例建立了 ssh 连接，我还可以在登录到实例时手动下载包。

但是，由于我想充分利用 Terraform，我想在 Terraform 创建实例时预先安装一些东西。

我要执行的命令很简单（安装jdk、python、docker），

user_data= <<-EOF
#! /bin/bash
    echo "Installing modules..."
    sudo apt-get update
    sudo apt-get install -y openjdk-8-jdk
    sudo apt install -y python2.7 python-pip
    sudo apt install -y docker.io
    sudo systemctl start docker
    sudo systemctl enable docker
    pip install setuptools
    echo "Modules installed via Terraform"
EOF

我的第一个方法是利用 user_data 参数。即使 ec2 实例可以访问互联网，但指定的模块都没有安装。然后我使用了 remote-exec 块以及 terraform 提供的 connection 块。但正如我们许多人之前经历的那样，terraform 无法与主机建立成功的连接，并返回以下消息，

远程执行块

connection 
  type        = "ssh"
  host        = aws_eip.prod_server_public_ip.public_ip //Error: host for provisioner cannot be empty -> https://github.com/hashicorp/terraform-provider-aws/issues/10977
  user        = "ubuntu"
  private_key = "$chomp(tls_private_key.ssh_key_prod.private_key_pem)"
  timeout     = "1m"


provisioner "remote-exec" 
  inline = [
    "echo 'Installing modules...'",
    "sudo apt-get update",
    "sudo apt-get install -y openjdk-8-jdk",
    "sudo apt install -y python2.7 python-pip",
    "sudo apt install -y docker.io",
    "sudo systemctl start docker",
    "sudo systemctl enable docker",
    "pip install setuptools",
    "echo 'Modules installed via Terraform'"
  ]
  on_failure = fail

i/o 超时消息

Connecting to remote host via SSH...
module.virtual_machines.null_resource.install_modules (remote-exec):   Host: 3.137.111.207
module.virtual_machines.null_resource.install_modules (remote-exec):   User: ubuntu
module.virtual_machines.null_resource.install_modules (remote-exec):   Password: false
module.virtual_machines.null_resource.install_modules (remote-exec):   Private key: true
module.virtual_machines.null_resource.install_modules (remote-exec):   Certificate: false
module.virtual_machines.null_resource.install_modules (remote-exec):   SSH Agent: false
module.virtual_machines.null_resource.install_modules (remote-exec):   Checking Host Key: false
module.virtual_machines.null_resource.install_modules (remote-exec):   Target Platform: unix

timeout - last error: dial tcp 52.15.178.40:22: i/o timeout

我能想到的问题的一个根源是我只允许 2 个特定的 IP 地址通过安全组的入站路由。因此，当 terraform 尝试连接时，它会从未知 ip 连接到安全组。如果是这种情况，哪个 IP 地址可以让 terraform 连接到我的 vm 并预安装包？

Terraform code 用于基础架构。

【问题讨论】：

运行用户数据脚本时服务器的日志文件中会显示什么？您应该查看/var/log/syslog 和/var/log/user-data.log。此外，将用户数据添加到现有 EC2 实例实际上并没有做任何事情。它是在创建服务器时运行的脚本。 @MarkB 让我检查服务器的日志记录... ec2 实例是由 terraform 创建的，即使已经有一个实例，我在代码基础结构中稍作更改将在销毁后重新创建实例首先。 @MarkB 你实际上是对的..Docker 和 Java 确实是基于 syslog 文件安装的。只有python没有安装，因为我猜不再支持python2.7。所以我猜 user_data 确实是正确的方法。一开始我不知道 syslog 真的很不方便。您的 ec2 实例的完整代码是什么？您使用的是什么操作系统？ @Marcin 我已经上传了 ec2 实例的 terraform 代码。为了回答你的问题，我使用了 Ubuntu - Focal 20.04 LTS 【参考方案1】：

我在我的沙盒环境中运行您的代码，并且 remote-exec 工作。我必须对其进行一些更改才能工作，甚至运行您的代码（区域、ami、安全组，...）。因此，您可以查看修改后的代码并从那里获取。但是下面的代码对我有用，没有任何问题。

terraform 
  required_providers 
    aws = 
      source  = "hashicorp/aws"
      version = "~> 3.0"
    
  




variable "prefix" 
    default="my"


# Create virtual private cloud (vpc)
resource "aws_vpc" "vpc_prod" 
  cidr_block = "10.0.0.0/16" #or 10.0.0.0/16
  enable_dns_hostnames = true
  enable_dns_support = true

  tags = 
      Name = "production-private-cloud"
  


# Assign gateway to vp
resource "aws_internet_gateway" "gw" 
  vpc_id = aws_vpc.vpc_prod.id

  tags = 
      Name = "production-igw"
  


# ---------------------------------------- Step 1: Create two subnets ----------------------------------------
data "aws_availability_zones" "available" 
  state = "available"


resource "aws_subnet" "subnet_prod" 
  vpc_id            = aws_vpc.vpc_prod.id
  cidr_block        = "10.0.1.0/24"
  availability_zone = "us-east-1a" #data.aws_availability_zones.available.names[0]
  depends_on        = [aws_internet_gateway.gw]

  map_public_ip_on_launch = true

  tags = 
      Name = "main-public-1"
  


resource "aws_subnet" "subnet_prod_id2" 
  vpc_id            = aws_vpc.vpc_prod.id
  cidr_block        = "10.0.2.0/24" //a second subnet can't use the same cidr block as the first subnet
  availability_zone = "us-east-1b" #data.aws_availability_zones.available.names[1]
  depends_on        = [aws_internet_gateway.gw]

  tags = 
        Name = "main-public-2"
    


# ---------------------------------------- Step 2: Create ACL network/ rules ----------------------------------------
resource "aws_network_acl" "production_acl_network" 
  vpc_id = aws_vpc.vpc_prod.id
  subnet_ids = [aws_subnet.subnet_prod.id, aws_subnet.subnet_prod_id2.id] #assign the created subnets to the acl network otherwirse the NACL is assigned to a default subnet

  tags = 
    Name = "production-network-acl"
  


# Create acl rules for the network
# ACL inbound
resource "aws_network_acl_rule" "all_inbound_traffic_acl" 
  network_acl_id = aws_network_acl.production_acl_network.id
  rule_number    = 180
  protocol       = -1
  rule_action    = "allow"
  cidr_block     = "0.0.0.0/0"
  from_port      = 0
  to_port        = 0


# ACL outbound
resource "aws_network_acl_rule" "all_outbound_traffic_acl" 
  network_acl_id = aws_network_acl.production_acl_network.id
  egress         = true
  protocol       = -1
  rule_action    = "allow"
  rule_number    = 180
  cidr_block     = "0.0.0.0/0"
  from_port      = 0
  to_port        = 0


# ---------------------------------------- Step 3: Create security group/ rules ----------------------------------------
resource "aws_security_group" "sg_prod" 
    name   = "production-security-group"
    vpc_id = aws_vpc.vpc_prod.id


# Create first (inbound) security rule to open port 22 for ssh connection request
resource "aws_security_group_rule" "ssh_inbound_rule_prod" 
  type              = "ingress"
  from_port         = 22
  to_port           = 22
  protocol          = "tcp"
  cidr_blocks       = ["0.0.0.0/0"] #aws_vpc.vpc_prod.cidr_block, "0.0.0.0/0"
  security_group_id = aws_security_group.sg_prod.id
  description       = "security rule to open port 22 for ssh connection"


# Create fifth (inbound) security rule to allow pings of public ip address of ec2 instance from local machine
resource "aws_security_group_rule" "ping_public_ip_sg_rule" 
  type              = "ingress"
  from_port         = 8
  to_port           = 0
  protocol          = "icmp"
  cidr_blocks       = ["0.0.0.0/0"] #aws_vpc.vpc_prod.cidr_block, "0.0.0.0/0"
  security_group_id = aws_security_group.sg_prod.id
  description       = "allow pinging elastic public ipv4 address of ec2 instance from local machine"


#--------------------------------

# Create first (outbound) security rule to open port 80 for HTTP requests (this will help to download packages while connected to vm)
resource "aws_security_group_rule" "http_outbound_rule_prod" 
  type              = "egress"
  from_port         = 80
  to_port           = 80
  protocol          = "tcp"
  cidr_blocks       = ["0.0.0.0/0"] #aws_vpc.vpc_prod.cidr_block, "0.0.0.0/0"
  security_group_id = aws_security_group.sg_prod.id
  description       = "security rule to open port 80 for outbound connection with http from remote server"


# Create second (outbound) security rule to open port 443 for HTTPS requests
resource "aws_security_group_rule" "https_outbound_rule_prod" 
  type              = "egress"
  from_port         = 443
  to_port           = 443
  protocol          = "tcp"
  cidr_blocks       = ["0.0.0.0/0"] #aws_vpc.vpc_prod.cidr_block, "0.0.0.0/0"
  security_group_id = aws_security_group.sg_prod.id
  description       = "security rule to open port 443 for outbound connection with https from remote server"


# ---------------------------------------- Step 4: SSH key generated for accessing VM ----------------------------------------
resource "tls_private_key" "ssh_key_prod" 
  algorithm = "RSA"
  rsa_bits  = 4096


# ---------------------------------------- Step 5: Generate aws_key_pair ----------------------------------------
resource "aws_key_pair" "generated_key_prod" 
  key_name   = "$var.prefix-server-ssh-key"
  public_key = tls_private_key.ssh_key_prod.public_key_openssh

  tags   = 
    Name = "SSH key pair for production server"
  


# ---------------------------------------- Step 6: Create network interface ----------------------------------------

# Create network interface
resource "aws_network_interface" "network_interface_prod" 
  subnet_id       = aws_subnet.subnet_prod.id
  security_groups = [aws_security_group.sg_prod.id]
  #private_ip      = aws_eip.prod_server_public_ip.private_ip #!!! not sure if this argument is correct !!!
  description     = "Production server network interface"

  tags   = 
    Name = "production-network-interface"
  


# ---------------------------------------- Step 7: Create the Elastic Public IP after having created the network interface ----------------------------------------

resource "aws_eip" "prod_server_public_ip" 
  vpc               = true
  #instance          = aws_instance.production_server.id
  network_interface = aws_network_interface.network_interface_prod.id
  #don't specify both instance and a network_interface id, one of the two!

  depends_on        = [aws_internet_gateway.gw, aws_network_interface.network_interface_prod]
  tags   = 
    Name = "production-elastic-ip"
  


# ---------------------------------------- Step 8: Associate public ip to network interface ----------------------------------------

resource "aws_eip_association" "eip_assoc" 
  #dont use instance, network_interface_id at the same time
  #instance_id   = aws_instance.production_server.id
  allocation_id = aws_eip.prod_server_public_ip.id
  network_interface_id = aws_network_interface.network_interface_prod.id

  depends_on = [aws_eip.prod_server_public_ip, aws_network_interface.network_interface_prod]


# ---------------------------------------- Step 9: Create route table with rules ----------------------------------------

resource "aws_route_table" "route_table_prod" 
  vpc_id = aws_vpc.vpc_prod.id
  tags   = 
    Name = "route-table-production-server"
  


/*documentation =>
https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Internet_Gateway.html#Add_IGW_Routing
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-connect-set-up.html?icmpid=docs_ec2_console#ec2-instance-connect-setup-security-group
*/

resource "aws_route" "route_prod_all" 
  route_table_id         = aws_route_table.route_table_prod.id
  destination_cidr_block = "0.0.0.0/0"
  gateway_id             = aws_internet_gateway.gw.id
  depends_on             = [
    aws_route_table.route_table_prod, aws_internet_gateway.gw
  ]


# Create main route table association with the two subnets
resource "aws_main_route_table_association" "main-route-table" 
  vpc_id         = aws_vpc.vpc_prod.id
  route_table_id = aws_route_table.route_table_prod.id


resource "aws_route_table_association" "main-public-1-a" 
  subnet_id      = aws_subnet.subnet_prod.id
  route_table_id = aws_route_table.route_table_prod.id


resource "aws_route_table_association" "main-public-1-b" 
  subnet_id      = aws_subnet.subnet_prod_id2.id
  route_table_id = aws_route_table.route_table_prod.id


# ---------------------------------------- Step 10: Create the AWS EC2 instance ----------------------------------------

resource "aws_instance" "production_server" 
  depends_on                  = [aws_eip.prod_server_public_ip, aws_network_interface.network_interface_prod]
  ami                         = "ami-09e67e426f25ce0d7"
  instance_type               = "t2.micro"
  #key_name                    = "MyKeyPair"#aws_key_pair.generated_key_prod.key_name
  key_name                    = aws_key_pair.generated_key_prod.key_name

  network_interface 
    network_interface_id = aws_network_interface.network_interface_prod.id
    device_index         = 0
  

  ebs_block_device 
    device_name = "/dev/sda1"
    volume_type = "standard"
    volume_size = 8
  

  connection 
    type        = "ssh"
    host        = aws_eip.prod_server_public_ip.public_ip #Error: host for provisioner cannot be empty -> https://github.com/hashicorp/terraform-provider-aws/issues/10977
    user        = "ubuntu"
    private_key = tls_private_key.ssh_key_prod.private_key_pem
    timeout     = "1m"
  

  provisioner "remote-exec" 
    inline = [
      "echo 'Installing modules...'",
      "sudo apt-get update",
      "sudo apt-get install -y openjdk-8-jdk",
      "sudo apt install -y python2.7 python-pip",
      "sudo apt install -y docker.io",
      "sudo systemctl start docker",
      "sudo systemctl enable docker",
      "pip install setuptools",
      "echo 'Modules installed via Terraform'"
    ]
    on_failure = fail
  

  #user_data= <<-EOF
        #! /bin/bash
    #echo "Installing modules..."
    #sudo apt-get update
    #sudo apt-get install -y openjdk-8-jdk
    #sudo apt install -y python2.7 python-pip
    #sudo apt install -y docker.io
    #sudo systemctl start docker
    #sudo systemctl enable docker
    #pip install setuptools
    #echo "Modules installed via Terraform"
    #EOF

  tags   = 
    Name = "production-server"
  

  volume_tags = 
    Name = "production-volume"

【讨论】：

Marcin 感谢您的回复和测试代码的努力。我上传的代码之间的主要更改点是：a）private_key = tls_private_key.ssh_key_prod.private_key_pem 和 b）您接受来自入站安全规则的所有 IP 地址。我仍然收到i/o connection timeout ip_address:22。您能否尝试一下您上传的代码，但这次只接受来自特定 IPv4 地址的入站流量 [安全组] - 就像在我的示例中一样 @NikSp 我刚试过，它也有效。也许您在规则中使用了错误的 IP？我将再次检查本地机器的 IPv4 地址，因为它插入的是以太网连接而不是 wifi。也许这种变化影响了 IPv4 地址 @NikSp 但是你首先检查过0.0.0.0/0 cidrs 吗？这样您就可以仅将问题定位到 IP 地址。 Marcin 这行 private_key = tls_private_key.ssh_key_prod.private_key_pem 我猜不是 Terraform 的有效表示，所以我恢复到其原始形式（根据我的代码）private_key = "$chomp(tls_private_key.ssh_key_prod.private_key_pem)" 并且一切正常。但是仍然让所有 IP 地址访问端口 22 (ssh) 并不是我想要实现的。无论如何，我会接受你的回答，因为它解决了我的问题。很高兴你帮助了我。

以上是关于在使用 terraform cloud [aws-provider] 启动 ec2 实例时，既不能执行 user_data 脚本，也不能使用连接块执行 remote-exec的主要内容，如果未能解决你的问题，请参考以下文章