在 Azure VM 上使用 cloud-init 挂载数据磁盘失败

Posted

技术标签:

【中文标题】在 Azure VM 上使用 cloud-init 挂载数据磁盘失败【英文标题】:Using cloud-init on an Azure VM to mount a data disk fails 【发布时间】:2020-07-19 23:13:27 【问题描述】:

这是一个与之前的 SO 问题类似的问题,我从中修改了我的代码 How can i use cloud-init to load a datadisk on an ubuntu VM in azure

使用通过 Terraform 传递的云配置文件:

#cloud-config
disk_setup:
  /dev/disk/azure/scsi1/lun0:
    table_type: gpt
    layout: true
    overwrite: false

fs_setup:
  - device: /dev/disk/azure/scsi1/lun0
    partition: 1
    filesystem: ext4

mounts:
  - [
      "/dev/disk/azure/scsi1/lun0-part1",
      "/opt/data",
      auto,
      "defaults,noexec,nofail",
    ]
data "template_file" "cloudconfig" 
  template = file("$path.module/cloud-init.tpl")


data "template_cloudinit_config" "config" 
  gzip          = true
  base64_encode = true

  part 
    content_type = "text/cloud-config"
    content      = "$data.template_file.cloudconfig.rendered"
  


module "nexus_test_vm" 
  #unnecessary details ommitted - 1 VM with 1 external disk, fixed lun of 0, ubuntu 18.04
  vm_size            = "Standard_B2S"

  cloud_init_template = data.template_cloudinit_config.config.rendered

模块的相关位(VM创建)

resource "azurerm_virtual_machine" "generic-vm" 
  count               = var.number
  name                = "$local.my_name-$count.index-vm"
  location            = var.location
  resource_group_name = var.resource_group_name

  network_interface_ids         = [azurerm_network_interface.generic-nic[count.index].id]
  vm_size                       = var.vm_size
  delete_os_disk_on_termination = true

  storage_image_reference 
    id = var.image_id
  

  storage_os_disk 
    name              = "$local.my_name-$count.index-os"
    caching           = "ReadWrite"
    create_option     = "FromImage"
    managed_disk_type = "Standard_LRS"
    disk_size_gb      = var.os_disk_size
  

  os_profile 
    computer_name  = "$local.my_name-$count.index"
    admin_username = local.my_admin_user_name
    custom_data    = var.cloud_init_template
  

  os_profile_linux_config 
    disable_password_authentication = true

    ssh_keys 
      path = "/home/$local.my_admin_user_name/.ssh/authorized_keys"
      //key_data = tls_private_key.vm_ssh_key.public_key_openssh
      key_data = var.public_key_openssh
    
  

  tags = 
    Name        = "$local.my_name-$count.index"
    Deployment  = local.my_deployment
    Prefix      = var.prefix
    Environment = var.env
    Location    = var.location
    Volatile    = var.volatile
    Terraform   = "true"
  


resource "azurerm_managed_disk" "generic-disk" 
  name                 = "$azurerm_virtual_machine.generic-vm.*.name[0]-1-generic-disk"
  location             = var.rg_location
  resource_group_name  = var.rg_name
  storage_account_type = "Standard_LRS"
  create_option        = "Empty"
  disk_size_gb         = var.external_disk_size


resource "azurerm_virtual_machine_data_disk_attachment" "generic-disk" 
  managed_disk_id    = azurerm_managed_disk.generic-disk.id
  virtual_machine_id = azurerm_virtual_machine.generic-vm.*.id[0]
  lun                = 0
  caching            = "ReadWrite"

我收到很多奇怪的错误,表明运行 cloud-init 时磁盘不存在。但是,当我 ssh 进入虚拟机时,磁盘就在那里!这是比赛条件吗?是否可以在 cloud-init 中配置等待,或者让我更好地了解可能发生的情况?

来自虚拟机的相关日志:

head -n 5000 /var/log/cloud-init.log | grep lun

2020-04-07 16:30:51,296 - cc_disk_setup.py[DEBUG]: Partitioning disks: '/dev/disk/azure/scsi1/lun0': 'layout': True, 'overwrite': False, 'table_type': 'gpt', '/dev/disk/cloud/azure_resource': 'table_type': 'gpt', 'layout': [100], 'overwrite': True, '_origname': 'ephemeral0'
2020-04-07 16:30:51,318 - util.py[DEBUG]: Creating partition on /dev/disk/azure/scsi1/lun0 took 0.021 seconds
Device /dev/disk/azure/scsi1/lun0 did not exist and was not created with a udevadm settle.
Device /dev/disk/azure/scsi1/lun0 did not exist and was not created with a udevadm settle.
RuntimeError: Device /dev/disk/azure/scsi1/lun0 did not exist and was not created with a udevadm settle.
2020-04-07 16:30:51,601 - cc_disk_setup.py[DEBUG]: setting up filesystems: ['device': '/dev/disk/azure/scsi1/lun0', 'filesystem': 'ext4', 'partition': 1]
2020-04-07 16:30:51,725 - util.py[DEBUG]: Creating fs for /dev/disk/azure/scsi1/lun0 took 0.124 seconds
Device /dev/disk/azure/scsi1/lun0 did not exist and was not created with a udevadm settle.
Device /dev/disk/azure/scsi1/lun0 did not exist and was not created with a udevadm settle.
RuntimeError: Device /dev/disk/azure/scsi1/lun0 did not exist and was not created with a udevadm settle.
2020-04-07 16:30:51,733 - cc_mounts.py[DEBUG]: mounts configuration is [['/dev/disk/azure/scsi1/lun0-part1', '/opt/data', 'auto', 'defaults,noexec,nofail']]
2020-04-07 16:30:51,734 - cc_mounts.py[DEBUG]: Attempting to determine the real name of /dev/disk/azure/scsi1/lun0-part1
2020-04-07 16:30:51,734 - cc_mounts.py[DEBUG]: changed /dev/disk/azure/scsi1/lun0-part1 => None
2020-04-07 16:30:51,734 - cc_mounts.py[DEBUG]: Ignoring nonexistent named mount /dev/disk/azure/scsi1/lun0-part1
2020-04-07 16:30:51,736 - cc_mounts.py[DEBUG]: Changes to fstab: ['+ /dev/disk/azure/scsi1/lun0-part1 /opt/data auto defaults,noexec,nofail,comment=cloudconfig 0 2']

ls -l /dev/disk/azure/scsi1/lun0

lrwxrwxrwx 1 root root 12 Apr  7 16:32 /dev/disk/azure/scsi1/lun0 -> ../../../sdc

【问题讨论】:

能展示一下你创建的模块的内容吗? 更新了模块的相关部分 我没有在虚拟机中看到任何 storage_data_disk。如何将数据磁盘附加到 VM? 也添加了! 好的,我看到了。您使用关联来附加数据磁盘。我认为顺序是问题。您可以尝试将数据盘添加到VM资源中,块storage_data_disk 【参考方案1】:

对于这个问题,我认为是数据盘和VM和cloud-init的顺序。据我所知,云初始化是在虚拟机首次启动时执行的。而且你创建的Terraform文件好像数据盘创建的时间比VM晚,所以也比cloud-init晚,然后报错。

所以解决方案是在虚拟机内部设置数据盘storage_data_disk块,这样虚拟机就会被创建并附加数据盘,然后执行cloud-init。

【讨论】:

以上是关于在 Azure VM 上使用 cloud-init 挂载数据磁盘失败的主要内容,如果未能解决你的问题,请参考以下文章

如何在不重启的情况下重新运行 cloud-init

使用 azure cli/bash 在 Azure VM 的“实时”数据磁盘上更新缓存设置

NEstted Virtualization在Azure VM上不起作用

如何在 Azure VM 上设置 FTP

在 Azure VM 上使用 Jitsi 搭建私人视频会议

在 azure vm 上运行 ansible 的库错误