创建 AMI 映像作为 cloudformation 堆栈的一部分

Posted 2023-03-04

技术标签:

【中文标题】创建 AMI 映像作为 cloudformation 堆栈的一部分【英文标题】：Create AMI image as part of a cloudformation stack 【发布时间】：2014-02-21 06:40:03 【问题描述】：

我想创建一个 EC2 cloudformation 堆栈，基本上可以通过以下步骤进行描述：

1.- 启动实例

2.- 提供实例

3.- 停止实例并从中创建 AMI 映像

4.- 使用创建的 AMI 映像创建一个自动扩展组作为启动新实例的源。

基本上我可以在一个 cloudformation 模板中执行 1 和 2，在第二个模板中执行 4。我似乎无法从 cloudformation 模板中的实例创建 AMI 映像，如果我想删除堆栈，这基本上会产生必须手动删除 AMI 的问题。

话虽如此，我的问题是：

1.- 有没有办法从 cloudformation 模板内的实例创建 AMI 映像？

2.- 如果对 1 的回答是否定的，有没有办法添加 AMI 图像（或任何其他资源）以使其成为完整堆栈的一部分？

编辑：

澄清一下，我已经解决了创建 AMI 并在 cloudformation 模板中使用它的问题，我只是无法在 cloudformation 模板中创建 AMI 或以某种方式将其添加到创建的堆栈中。

正如我评论 Rico 的回答，我现在做的是使用一个基本上有 3 个步骤的 ansible playbook：

1.- 使用 cloudformation 模板创建基础实例

2.- 使用 ansible 创建在步骤 1 中创建的实例的 AMI

3.- 使用第二个 cloudformation 模板创建堆栈的其余部分（ELB、自动缩放组等），该模板更新在步骤 1 中创建的模板，并使用在步骤 2 中创建的 AMI 来启动实例。

这就是我现在管理它的方式，但我想知道是否有任何方法可以在 cloudformation 模板中创建 AMI，或者是否可以将创建的 AMI 添加到堆栈中（比如告诉堆栈，“嘿，这也属于你，所以处理它”）。

【问题讨论】：

【参考方案1】：

是的，您可以通过实现Custom Resource 在创建时调用CreateImage API（并在删除时调用DeregisterImage 和DeleteSnapshot API）从 CloudFormation 模板中的 EC2 实例创建 AMI。

由于 AMI 有时可能需要很长时间才能创建，因此如果在 Lambda 函数超时之前等待尚未完成，则由 Lambda 支持的自定义资源将需要重新调用自身。

这是一个完整的例子：

Description: Create an AMI from an EC2 instance.
Parameters:
  ImageId:
    Description: Image ID for base EC2 instance.
    Type: AWS::EC2::Image::Id
    # amzn-ami-hvm-2016.09.1.20161221-x86_64-gp2
    Default: ami-9be6f38c
  InstanceType:
    Description: Instance type to launch EC2 instances.
    Type: String
    Default: m3.medium
    AllowedValues: [ m3.medium, m3.large, m3.xlarge, m3.2xlarge ]
Resources:
  # Completes when the instance is fully provisioned and ready for AMI creation.
  AMICreate:
    Type: AWS::CloudFormation::WaitCondition
    CreationPolicy:
      ResourceSignal:
        Timeout: PT10M
  Instance:
    Type: AWS::EC2::Instance
    Properties:
      ImageId: !Ref ImageId
      InstanceType: !Ref InstanceType
      UserData:
        "Fn::Base64": !Sub |
          #!/bin/bash -x
          yum -y install mysql # provisioning example
          /opt/aws/bin/cfn-signal \
            -e $? \
            --stack $AWS::StackName \
            --region $AWS::Region \
            --resource AMICreate
          shutdown -h now
  AMI:
    Type: Custom::AMI
    DependsOn: AMICreate
    Properties:
      ServiceToken: !GetAtt AMIFunction.Arn
      InstanceId: !Ref Instance
  AMIFunction:
    Type: AWS::Lambda::Function
    Properties:
      Handler: index.handler
      Role: !GetAtt LambdaExecutionRole.Arn
      Code:
        ZipFile: !Sub |
          var response = require('cfn-response');
          var AWS = require('aws-sdk');
          exports.handler = function(event, context) 
            console.log("Request received:\n", JSON.stringify(event));
            var physicalId = event.PhysicalResourceId;
            function success(data) 
              return response.send(event, context, response.SUCCESS, data, physicalId);
            
            function failed(e) 
              return response.send(event, context, response.FAILED, e, physicalId);
            
            // Call ec2.waitFor, continuing if not finished before Lambda function timeout.
            function wait(waiter) 
              console.log("Waiting: ", JSON.stringify(waiter));
              event.waiter = waiter;
              event.PhysicalResourceId = physicalId;
              var request = ec2.waitFor(waiter.state, waiter.params);
              setTimeout(()=>
                request.abort();
                console.log("Timeout reached, continuing function. Params:\n", JSON.stringify(event));
                var lambda = new AWS.Lambda();
                lambda.invoke(
                  FunctionName: context.invokedFunctionArn,
                  InvocationType: 'Event',
                  Payload: JSON.stringify(event)
                ).promise().then((data)=>context.done()).catch((err)=>context.fail(err));
              , context.getRemainingTimeInMillis() - 5000);
              return request.promise().catch((err)=>
                (err.code == 'RequestAbortedError') ?
                  new Promise(()=>context.done()) :
                  Promise.reject(err)
              );
            
            var ec2 = new AWS.EC2(),
                instanceId = event.ResourceProperties.InstanceId;
            if (event.waiter) 
              wait(event.waiter).then((data)=>success()).catch((err)=>failed(err));
             else if (event.RequestType == 'Create' || event.RequestType == 'Update') 
              if (!instanceId)  failed('InstanceID required'); 
              ec2.waitFor('instanceStopped', InstanceIds: [instanceId]).promise()
              .then((data)=>
                ec2.createImage(
                  InstanceId: instanceId,
                  Name: event.RequestId
                ).promise()
              ).then((data)=>
                wait(
                  state: 'imageAvailable',
                  params: ImageIds: [physicalId = data.ImageId]
                )
              ).then((data)=>success()).catch((err)=>failed(err));
             else if (event.RequestType == 'Delete') 
              if (physicalId.indexOf('ami-') !== 0)  return success();
              ec2.describeImages(ImageIds: [physicalId]).promise()
              .then((data)=>
                (data.Images.length == 0) ? success() :
                ec2.deregisterImage(ImageId: physicalId).promise()
              ).then((data)=>
                ec2.describeSnapshots(Filters: [
                  Name: 'description',
                  Values: ["*" + physicalId + "*"]
                ]).promise()
              ).then((data)=>
                (data.Snapshots.length === 0) ? success() :
                ec2.deleteSnapshot(SnapshotId: data.Snapshots[0].SnapshotId).promise()
              ).then((data)=>success()).catch((err)=>failed(err));
            
          ;
      Runtime: nodejs4.3
      Timeout: 300
  LambdaExecutionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Effect: Allow
          Principal: Service: [lambda.amazonaws.com]
          Action: ['sts:AssumeRole']
      Path: /
      ManagedPolicyArns:
      - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
      - arn:aws:iam::aws:policy/service-role/AWSLambdaRole
      Policies:
      - PolicyName: EC2Policy
        PolicyDocument:
          Version: '2012-10-17'
          Statement:
            - Effect: Allow
              Action:
              - 'ec2:DescribeInstances'
              - 'ec2:DescribeImages'
              - 'ec2:CreateImage'
              - 'ec2:DeregisterImage'
              - 'ec2:DescribeSnapshots'
              - 'ec2:DeleteSnapshot'
              Resource: ['*']
Outputs:
  AMI:
    Value: !Ref AMI

【讨论】：

很棒的模板！感谢分享。从 2 年前开始，但我不明白什么时候执行 lambda？这是什么关于“physicalId.indexOf('ami-')...”。需要详细说明吗？该操作完成后，资源的PhysicalResourceId 将设置为ec2.createImage 返回的ImageId，并且所有有效的图像ID 都以ami- 前缀开头。当物理 ID 设置为有效的图像 ID 时，physicalId.indexOf('ami-') 将等于 0，这意味着删除资源时需要删除图像（通过ec2.deregisterImage）。如果在删除资源之前从未将物理资源设置为有效的 Image ID，则可以跳过 deregisterImage 操作。【参考方案2】：

对于它的价值，这里是 wjordan 的 AMIFunction 定义 in the original answer 的 Python 变体。原 yaml 中的所有其他资源保持不变：

AMIFunction:
  Type: AWS::Lambda::Function
  Properties:
    Handler: index.handler
    Role: !GetAtt LambdaExecutionRole.Arn
    Code:
      ZipFile: !Sub |
        import logging
        import cfnresponse
        import json
        import boto3
        from threading import Timer
        from botocore.exceptions import WaiterError

        logger = logging.getLogger()
        logger.setLevel(logging.INFO)

        def handler(event, context):

          ec2 = boto3.resource('ec2')
          physicalId = event['PhysicalResourceId'] if 'PhysicalResourceId' in event else None

          def success(data=):
            cfnresponse.send(event, context, cfnresponse.SUCCESS, data, physicalId)

          def failed(e):
            cfnresponse.send(event, context, cfnresponse.FAILED, str(e), physicalId)

          logger.info('Request received: %s\n' % json.dumps(event))

          try:
            instanceId = event['ResourceProperties']['InstanceId']
            if (not instanceId):
              raise 'InstanceID required'

            if not 'RequestType' in event:
              success('Data': 'Unhandled request type')
              return

            if event['RequestType'] == 'Delete':
              if (not physicalId.startswith('ami-')):
                raise 'Unknown PhysicalId: %s' % physicalId

              ec2client = boto3.client('ec2')
              images = ec2client.describe_images(ImageIds=[physicalId])
              for image in images['Images']:
                ec2.Image(image['ImageId']).deregister()
                snapshots = ([bdm['Ebs']['SnapshotId'] 
                              for bdm in image['BlockDeviceMappings'] 
                              if 'Ebs' in bdm and 'SnapshotId' in bdm['Ebs']])
                for snapshot in snapshots:
                  ec2.Snapshot(snapshot).delete()

              success('Data': 'OK')
            elif event['RequestType'] in set(['Create', 'Update']):
              if not physicalId:  # AMI creation has not been requested yet
                instance = ec2.Instance(instanceId)
                instance.wait_until_stopped()

                image = instance.create_image(Name="Automatic from CloudFormation stack $AWS::StackName")

                physicalId = image.image_id
              else:
                logger.info('Continuing in awaiting image available: %s\n' % physicalId)

              ec2client = boto3.client('ec2')
              waiter = ec2client.get_waiter('image_available')

              try:
                waiter.wait(ImageIds=[physicalId], WaiterConfig='Delay': 30, 'MaxAttempts': 6)
              except WaiterError as e:
                # Request the same event but set PhysicalResourceId so that the AMI is not created again
                event['PhysicalResourceId'] = physicalId
                logger.info('Timeout reached, continuing function: %s\n' % json.dumps(event))
                lambda_client = boto3.client('lambda')
                lambda_client.invoke(FunctionName=context.invoked_function_arn, 
                                      InvocationType='Event',
                                      Payload=json.dumps(event))
                return

              success('Data': 'OK')
            else:
              success('Data': 'OK')
          except Exception as e:
            failed(e)
    Runtime: python2.7
    Timeout: 300

【讨论】：

然而，我想在创建映像后立即终止实例。在上面的函数中，我在第一次 success() 调用之前添加了行 boto3.resource('ec2').Instance(instanceId).terminate()。但它给出了一个错误“无效的响应对象：属性数据的值必须是一个对象”。有什么想法吗？【参考方案3】：

没有。我想是的。一旦堆栈您可以使用“更新堆栈”操作。您需要提供初始堆栈的完整 JSON 模板 + 您在同一个文件中的更改（更改的 AMI）我将首先在测试环境（不是生产环境）中运行它，因为我不确定该操作对现有实例。

为什么不先在 cloudformation 之外创建一个 AMI，然后在最终的 cloudformation 模板中使用该 AMI？

另一种选择是编写一些自动化程序来创建两个 cloudformation 堆栈，一旦您创建的 AMI 完成，您就可以删除第一个。

【讨论】：

Rico，如果我没记错的话（而且我现在正在这样做，我不认为我是），您可以在创建堆栈后通过更新它来修改它。在 cloudformation 之外创建 AMI 的想法是我现在处理它的方式。基本上我使用了一个带有 3 个步骤的 ansible playbook： 1.- 使用 cloudformation 创建一个实例 2.- 使用 ansible 创建该实例的 AMI 3.- 使用使用 ansible 创建的 AMI 创建堆栈的其余部分（更新创建的堆栈）我的问题实际上指向使 AMI 成为堆栈的一部分或作为 cloudformation 步骤的一部分。我会更新我的问题以澄清。 @dibits 知道了。所以我改变了我对数字 2 的回答。现在我记得有一个“更新堆栈”操作。您需要提供初始堆栈的完整 JSON 模板 + 您在同一文件中所做的更改。如果我理解您的意思，那么我只会更新 cloudformation 模板中的 AMI ID，但不会将 AMI 映像合并到堆栈中。也许要进一步澄清，因为我在一个堆栈中完成所有操作，所以我希望能够删除该堆栈并让 AMI 创建与堆栈自动取消注册（就像实例、elb 等发生的情况一样）。跨度> 我相信是的。我自己没试过。很想知道结果是什么。【参考方案4】：

虽然@wjdordan 的解决方案适用于简单的用例，但更新用户数据不会更新 AMI。

（免责声明：我是原作者）cloudformation-ami 旨在让您在 CloudFormation 中声明可以可靠地创建、更新和删除的 AMI。使用 cloudformation-ami 您可以像这样声明自定义 AMI：

MyAMI:
  Type: Custom::AMI
  Properties:
    ServiceToken: !ImportValue AMILambdaFunctionArn
    Image:
      Name: my-image
      Description: some description for the image
    TemplateInstance:
      ImageId: ami-467ca739
      IamInstanceProfile:
        Arn: arn:aws:iam::1234567890:instance-profile/MyProfile-ASDNSDLKJ
      UserData:
        Fn::Base64: !Sub |
          #!/bin/bash -x
          yum -y install mysql # provisioning example
          # Signal that the instance is ready
          INSTANCE_ID=`wget -q -O - http://169.254.169.254/latest/meta-data/instance-id`
          aws ec2 create-tags --resources $INSTANCE_ID --tags Key=UserDataFinished,Value=true --region $AWS::Region
      KeyName: my-key
      InstanceType: t2.nano
      SecurityGroupIds:
      - sg-d7bf78b0
      SubnetId: subnet-ba03aa91
      BlockDeviceMappings:
      - DeviceName: "/dev/xvda"
        Ebs:
          VolumeSize: '10'
          VolumeType: gp2

【讨论】：

对于它的价值，在更新 UserData 后扩展我的答案以更新 AMI 实际上相对简单，只需创建新的 WaitCondition 和 Custom::AMI 资源（或重命名现有定义，这将创建一个新定义并销毁旧定义）。对于我自己的非简单用例，我用 ERB 模板包装了 CloudFormation 模板，并在逻辑 ID 中包含了部分提交哈希，例如 AMICreate<%=commit%> 和 AMI<%=commit%>。是的，这是有道理的！感谢您的提示

以上是关于创建 AMI 映像作为 cloudformation 堆栈的一部分的主要内容，如果未能解决你的问题，请参考以下文章