在使用 terraform cloud [aws-provider] 启动 ec2 实例时,既不能执行 user_data 脚本,也不能使用连接块执行 remote-exec
Posted
技术标签:
【中文标题】在使用 terraform cloud [aws-provider] 启动 ec2 实例时,既不能执行 user_data 脚本,也不能使用连接块执行 remote-exec【英文标题】:Can't execute neither user_data script, nor remote-exec with connection block while launching ec2 instance with terraform cloud [aws-provider] 【发布时间】:2021-11-22 14:39:14 【问题描述】:我创建了一个带有网络 acl、安全组、子网等的 aws 基础架构 [代码附在底部]。在免费层。我还与我的 ec2 实例建立了 ssh 连接,我还可以在登录到实例时手动下载包。
但是,由于我想充分利用 Terraform,我想在 Terraform 创建实例时预先安装一些东西。
我要执行的命令很简单(安装jdk、python、docker),
user_data= <<-EOF
#! /bin/bash
echo "Installing modules..."
sudo apt-get update
sudo apt-get install -y openjdk-8-jdk
sudo apt install -y python2.7 python-pip
sudo apt install -y docker.io
sudo systemctl start docker
sudo systemctl enable docker
pip install setuptools
echo "Modules installed via Terraform"
EOF
我的第一个方法是利用 user_data 参数。即使 ec2 实例可以访问互联网,但指定的模块都没有安装。然后我使用了 remote-exec 块以及 terraform 提供的 connection 块。但正如我们许多人之前经历的那样,terraform 无法与主机建立成功的连接,并返回以下消息,
远程执行块
connection
type = "ssh"
host = aws_eip.prod_server_public_ip.public_ip //Error: host for provisioner cannot be empty -> https://github.com/hashicorp/terraform-provider-aws/issues/10977
user = "ubuntu"
private_key = "$chomp(tls_private_key.ssh_key_prod.private_key_pem)"
timeout = "1m"
provisioner "remote-exec"
inline = [
"echo 'Installing modules...'",
"sudo apt-get update",
"sudo apt-get install -y openjdk-8-jdk",
"sudo apt install -y python2.7 python-pip",
"sudo apt install -y docker.io",
"sudo systemctl start docker",
"sudo systemctl enable docker",
"pip install setuptools",
"echo 'Modules installed via Terraform'"
]
on_failure = fail
i/o 超时消息
Connecting to remote host via SSH...
module.virtual_machines.null_resource.install_modules (remote-exec): Host: 3.137.111.207
module.virtual_machines.null_resource.install_modules (remote-exec): User: ubuntu
module.virtual_machines.null_resource.install_modules (remote-exec): Password: false
module.virtual_machines.null_resource.install_modules (remote-exec): Private key: true
module.virtual_machines.null_resource.install_modules (remote-exec): Certificate: false
module.virtual_machines.null_resource.install_modules (remote-exec): SSH Agent: false
module.virtual_machines.null_resource.install_modules (remote-exec): Checking Host Key: false
module.virtual_machines.null_resource.install_modules (remote-exec): Target Platform: unix
timeout - last error: dial tcp 52.15.178.40:22: i/o timeout
我能想到的问题的一个根源是我只允许 2 个特定的 IP 地址通过安全组的入站路由。因此,当 terraform 尝试连接时,它会从未知 ip 连接到安全组。如果是这种情况,哪个 IP 地址可以让 terraform 连接到我的 vm 并预安装包?
Terraform code 用于基础架构。
【问题讨论】:
运行用户数据脚本时服务器的日志文件中会显示什么?您应该查看/var/log/syslog
和/var/log/user-data.log
。此外,将用户数据添加到现有 EC2 实例实际上并没有做任何事情。它是在创建服务器时运行的脚本。
@MarkB 让我检查服务器的日志记录... ec2 实例是由 terraform 创建的,即使已经有一个实例,我在代码基础结构中稍作更改将在销毁后重新创建实例首先。
@MarkB 你实际上是对的..Docker 和 Java 确实是基于 syslog 文件安装的。只有python没有安装,因为我猜不再支持python2.7。所以我猜 user_data 确实是正确的方法。一开始我不知道 syslog 真的很不方便。
您的 ec2 实例的完整代码是什么?您使用的是什么操作系统?
@Marcin 我已经上传了 ec2 实例的 terraform 代码。为了回答你的问题,我使用了 Ubuntu - Focal 20.04 LTS
【参考方案1】:
我在我的沙盒环境中运行您的代码,并且 remote-exec 工作。我必须对其进行一些更改才能工作,甚至运行您的代码(区域、ami、安全组,...)。因此,您可以查看修改后的代码并从那里获取。但是下面的代码对我有用,没有任何问题。
terraform
required_providers
aws =
source = "hashicorp/aws"
version = "~> 3.0"
variable "prefix"
default="my"
# Create virtual private cloud (vpc)
resource "aws_vpc" "vpc_prod"
cidr_block = "10.0.0.0/16" #or 10.0.0.0/16
enable_dns_hostnames = true
enable_dns_support = true
tags =
Name = "production-private-cloud"
# Assign gateway to vp
resource "aws_internet_gateway" "gw"
vpc_id = aws_vpc.vpc_prod.id
tags =
Name = "production-igw"
# ---------------------------------------- Step 1: Create two subnets ----------------------------------------
data "aws_availability_zones" "available"
state = "available"
resource "aws_subnet" "subnet_prod"
vpc_id = aws_vpc.vpc_prod.id
cidr_block = "10.0.1.0/24"
availability_zone = "us-east-1a" #data.aws_availability_zones.available.names[0]
depends_on = [aws_internet_gateway.gw]
map_public_ip_on_launch = true
tags =
Name = "main-public-1"
resource "aws_subnet" "subnet_prod_id2"
vpc_id = aws_vpc.vpc_prod.id
cidr_block = "10.0.2.0/24" //a second subnet can't use the same cidr block as the first subnet
availability_zone = "us-east-1b" #data.aws_availability_zones.available.names[1]
depends_on = [aws_internet_gateway.gw]
tags =
Name = "main-public-2"
# ---------------------------------------- Step 2: Create ACL network/ rules ----------------------------------------
resource "aws_network_acl" "production_acl_network"
vpc_id = aws_vpc.vpc_prod.id
subnet_ids = [aws_subnet.subnet_prod.id, aws_subnet.subnet_prod_id2.id] #assign the created subnets to the acl network otherwirse the NACL is assigned to a default subnet
tags =
Name = "production-network-acl"
# Create acl rules for the network
# ACL inbound
resource "aws_network_acl_rule" "all_inbound_traffic_acl"
network_acl_id = aws_network_acl.production_acl_network.id
rule_number = 180
protocol = -1
rule_action = "allow"
cidr_block = "0.0.0.0/0"
from_port = 0
to_port = 0
# ACL outbound
resource "aws_network_acl_rule" "all_outbound_traffic_acl"
network_acl_id = aws_network_acl.production_acl_network.id
egress = true
protocol = -1
rule_action = "allow"
rule_number = 180
cidr_block = "0.0.0.0/0"
from_port = 0
to_port = 0
# ---------------------------------------- Step 3: Create security group/ rules ----------------------------------------
resource "aws_security_group" "sg_prod"
name = "production-security-group"
vpc_id = aws_vpc.vpc_prod.id
# Create first (inbound) security rule to open port 22 for ssh connection request
resource "aws_security_group_rule" "ssh_inbound_rule_prod"
type = "ingress"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"] #aws_vpc.vpc_prod.cidr_block, "0.0.0.0/0"
security_group_id = aws_security_group.sg_prod.id
description = "security rule to open port 22 for ssh connection"
# Create fifth (inbound) security rule to allow pings of public ip address of ec2 instance from local machine
resource "aws_security_group_rule" "ping_public_ip_sg_rule"
type = "ingress"
from_port = 8
to_port = 0
protocol = "icmp"
cidr_blocks = ["0.0.0.0/0"] #aws_vpc.vpc_prod.cidr_block, "0.0.0.0/0"
security_group_id = aws_security_group.sg_prod.id
description = "allow pinging elastic public ipv4 address of ec2 instance from local machine"
#--------------------------------
# Create first (outbound) security rule to open port 80 for HTTP requests (this will help to download packages while connected to vm)
resource "aws_security_group_rule" "http_outbound_rule_prod"
type = "egress"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"] #aws_vpc.vpc_prod.cidr_block, "0.0.0.0/0"
security_group_id = aws_security_group.sg_prod.id
description = "security rule to open port 80 for outbound connection with http from remote server"
# Create second (outbound) security rule to open port 443 for HTTPS requests
resource "aws_security_group_rule" "https_outbound_rule_prod"
type = "egress"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"] #aws_vpc.vpc_prod.cidr_block, "0.0.0.0/0"
security_group_id = aws_security_group.sg_prod.id
description = "security rule to open port 443 for outbound connection with https from remote server"
# ---------------------------------------- Step 4: SSH key generated for accessing VM ----------------------------------------
resource "tls_private_key" "ssh_key_prod"
algorithm = "RSA"
rsa_bits = 4096
# ---------------------------------------- Step 5: Generate aws_key_pair ----------------------------------------
resource "aws_key_pair" "generated_key_prod"
key_name = "$var.prefix-server-ssh-key"
public_key = tls_private_key.ssh_key_prod.public_key_openssh
tags =
Name = "SSH key pair for production server"
# ---------------------------------------- Step 6: Create network interface ----------------------------------------
# Create network interface
resource "aws_network_interface" "network_interface_prod"
subnet_id = aws_subnet.subnet_prod.id
security_groups = [aws_security_group.sg_prod.id]
#private_ip = aws_eip.prod_server_public_ip.private_ip #!!! not sure if this argument is correct !!!
description = "Production server network interface"
tags =
Name = "production-network-interface"
# ---------------------------------------- Step 7: Create the Elastic Public IP after having created the network interface ----------------------------------------
resource "aws_eip" "prod_server_public_ip"
vpc = true
#instance = aws_instance.production_server.id
network_interface = aws_network_interface.network_interface_prod.id
#don't specify both instance and a network_interface id, one of the two!
depends_on = [aws_internet_gateway.gw, aws_network_interface.network_interface_prod]
tags =
Name = "production-elastic-ip"
# ---------------------------------------- Step 8: Associate public ip to network interface ----------------------------------------
resource "aws_eip_association" "eip_assoc"
#dont use instance, network_interface_id at the same time
#instance_id = aws_instance.production_server.id
allocation_id = aws_eip.prod_server_public_ip.id
network_interface_id = aws_network_interface.network_interface_prod.id
depends_on = [aws_eip.prod_server_public_ip, aws_network_interface.network_interface_prod]
# ---------------------------------------- Step 9: Create route table with rules ----------------------------------------
resource "aws_route_table" "route_table_prod"
vpc_id = aws_vpc.vpc_prod.id
tags =
Name = "route-table-production-server"
/*documentation =>
https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Internet_Gateway.html#Add_IGW_Routing
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-connect-set-up.html?icmpid=docs_ec2_console#ec2-instance-connect-setup-security-group
*/
resource "aws_route" "route_prod_all"
route_table_id = aws_route_table.route_table_prod.id
destination_cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.gw.id
depends_on = [
aws_route_table.route_table_prod, aws_internet_gateway.gw
]
# Create main route table association with the two subnets
resource "aws_main_route_table_association" "main-route-table"
vpc_id = aws_vpc.vpc_prod.id
route_table_id = aws_route_table.route_table_prod.id
resource "aws_route_table_association" "main-public-1-a"
subnet_id = aws_subnet.subnet_prod.id
route_table_id = aws_route_table.route_table_prod.id
resource "aws_route_table_association" "main-public-1-b"
subnet_id = aws_subnet.subnet_prod_id2.id
route_table_id = aws_route_table.route_table_prod.id
# ---------------------------------------- Step 10: Create the AWS EC2 instance ----------------------------------------
resource "aws_instance" "production_server"
depends_on = [aws_eip.prod_server_public_ip, aws_network_interface.network_interface_prod]
ami = "ami-09e67e426f25ce0d7"
instance_type = "t2.micro"
#key_name = "MyKeyPair"#aws_key_pair.generated_key_prod.key_name
key_name = aws_key_pair.generated_key_prod.key_name
network_interface
network_interface_id = aws_network_interface.network_interface_prod.id
device_index = 0
ebs_block_device
device_name = "/dev/sda1"
volume_type = "standard"
volume_size = 8
connection
type = "ssh"
host = aws_eip.prod_server_public_ip.public_ip #Error: host for provisioner cannot be empty -> https://github.com/hashicorp/terraform-provider-aws/issues/10977
user = "ubuntu"
private_key = tls_private_key.ssh_key_prod.private_key_pem
timeout = "1m"
provisioner "remote-exec"
inline = [
"echo 'Installing modules...'",
"sudo apt-get update",
"sudo apt-get install -y openjdk-8-jdk",
"sudo apt install -y python2.7 python-pip",
"sudo apt install -y docker.io",
"sudo systemctl start docker",
"sudo systemctl enable docker",
"pip install setuptools",
"echo 'Modules installed via Terraform'"
]
on_failure = fail
#user_data= <<-EOF
#! /bin/bash
#echo "Installing modules..."
#sudo apt-get update
#sudo apt-get install -y openjdk-8-jdk
#sudo apt install -y python2.7 python-pip
#sudo apt install -y docker.io
#sudo systemctl start docker
#sudo systemctl enable docker
#pip install setuptools
#echo "Modules installed via Terraform"
#EOF
tags =
Name = "production-server"
volume_tags =
Name = "production-volume"
【讨论】:
Marcin 感谢您的回复和测试代码的努力。我上传的代码之间的主要更改点是:a)private_key = tls_private_key.ssh_key_prod.private_key_pem
和 b)您接受来自入站安全规则的所有 IP 地址。我仍然收到i/o connection timeout ip_address:22
。您能否尝试一下您上传的代码,但这次只接受来自特定 IPv4 地址的入站流量 [安全组] - 就像在我的示例中一样
@NikSp 我刚试过,它也有效。也许您在规则中使用了错误的 IP?
我将再次检查本地机器的 IPv4 地址,因为它插入的是以太网连接而不是 wifi。也许这种变化影响了 IPv4 地址
@NikSp 但是你首先检查过0.0.0.0/0
cidrs 吗?这样您就可以仅将问题定位到 IP 地址。
Marcin 这行 private_key = tls_private_key.ssh_key_prod.private_key_pem
我猜不是 Terraform 的有效表示,所以我恢复到其原始形式(根据我的代码)private_key = "$chomp(tls_private_key.ssh_key_prod.private_key_pem)"
并且一切正常。但是仍然让所有 IP 地址访问端口 22 (ssh) 并不是我想要实现的。无论如何,我会接受你的回答,因为它解决了我的问题。很高兴你帮助了我。以上是关于在使用 terraform cloud [aws-provider] 启动 ec2 实例时,既不能执行 user_data 脚本,也不能使用连接块执行 remote-exec的主要内容,如果未能解决你的问题,请参考以下文章
带有 Terraform 的 Google Cloud 凭据
如何使用 cloud-init 和 Terraform 设置主机名?
使用 Google Cloud Platform 时 Terraform 状态锁定的机制是啥?