从服务器物理移除不相关的硬盘驱动器后,ZFS 报告健康状况下降

Posted

技术标签:

【中文标题】从服务器物理移除不相关的硬盘驱动器后,ZFS 报告健康状况下降【英文标题】:ZFS reports degraded health after physically removing unrelated hard drive from server 【发布时间】:2020-08-04 22:52:28 【问题描述】:

我有一台在 NVMe SSD (ext4) 和 2 个 ZFS 池上运行 Debian 10 (Proxmox) 的家庭服务器。第一个池是一个名为vault 的8x8TB Raid Z2 阵列,另一个是一个名为workspace 的RAID 0 2x1TB 阵列。

我最近想删除workspace。我停止了对文件系统的所有文件操作,卸载了文件系统,然后继续在池上运行zfs destroy。我从workspace 物理移除了其中一个驱动器,然后重新启动了计算机。

当我重新启动时,我希望只看到 vault ZFS 池并且它是健康的。但是,当我检查时,我发现它现在处于 DEGRADED 状态。

root@homenas:~# zpool status
  pool: vault
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub in progress since Tue Apr 21 19:03:01 2020
        2.35T scanned at 6.50G/s, 336G issued at 930M/s, 6.69T total
        0B repaired, 4.91% done, 0 days 01:59:31 to go
config:

        NAME                      STATE     READ WRITE CKSUM
        vault                     DEGRADED     0     0     0
          raidz2-0                DEGRADED     0     0     0
            sda                   ONLINE       0     0     0
            sdb                   ONLINE       0     0     0
            sdc                   ONLINE       0     0     0
            sdd                   ONLINE       0     0     0
            sde                   ONLINE       0     0     0
            sdf                   ONLINE       0     0     0
            15380939428578218220  FAULTED      0     0     0  was /dev/sdi1
            8563980037536059323   UNAVAIL      0     0     0  was /dev/sdj1

errors: No known data errors

我相信驱动器可能已重新分配到/dev/sdX 的不同路径。我不确定为什么一个出现故障而一个不可用。 vault ZFS 池仍然处于活动状态,我已经在运行备份以将所有最近的数据汇集到另一个存储介质。

但是,是否可以恢复我的 vault 池并使其恢复健康?如果这是由于删除 workspace 池后某些驱动器切换造成的,那么恢复阵列的最佳选择是什么?

这是我从 fdisk 获得的信息:

root@homenas:~# fdisk -l
Disk /dev/nvme0n1: 465.8 GiB, 500107862016 bytes, 976773168 sectors
Disk model: WDBRPG5000ANC-WRSN
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: XXXX123

Device           Start       End   Sectors   Size Type
/dev/nvme0n1p1      34      2047      2014  1007K Bios boot
/dev/nvme0n1p2    2048   1050623   1048576   512M EFI System
/dev/nvme0n1p3 1050624 976773134 975722511 465.3G Linux LVM


Disk /dev/sda: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: WDC WD80EMAZ-00W
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXX123

Device           Start         End     Sectors  Size Type
/dev/sda1         2048 15628036095 15628034048  7.3T Solaris /usr & Apple ZFS
/dev/sda9  15628036096 15628052479       16384    8M Solaris reserved 1


Disk /dev/sdb: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: ST8000DM004-2CX1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXX123

Device           Start         End     Sectors  Size Type
/dev/sdb1         2048 15628036095 15628034048  7.3T Solaris /usr & Apple ZFS
/dev/sdb9  15628036096 15628052479       16384    8M Solaris reserved 1


Disk /dev/sdc: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: ST8000DM004-2CX1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXX123

Device           Start         End     Sectors  Size Type
/dev/sdc1         2048 15628036095 15628034048  7.3T Solaris /usr & Apple ZFS
/dev/sdc9  15628036096 15628052479       16384    8M Solaris reserved 1


Disk /dev/sdd: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: ST8000DM004-2CX1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXX123

Device           Start         End     Sectors  Size Type
/dev/sdd1         2048 15628036095 15628034048  7.3T Solaris /usr & Apple ZFS
/dev/sdd9  15628036096 15628052479       16384    8M Solaris reserved 1


Disk /dev/sde: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: ST8000DM004-2CX1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXX123

Device           Start         End     Sectors  Size Type
/dev/sde1         2048 15628036095 15628034048  7.3T Solaris /usr & Apple ZFS
/dev/sde9  15628036096 15628052479       16384    8M Solaris reserved 1


Disk /dev/sdf: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: ST8000DM004-2CX1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXX123

Device           Start         End     Sectors  Size Type
/dev/sdf1         2048 15628036095 15628034048  7.3T Solaris /usr & Apple ZFS
/dev/sdf9  15628036096 15628052479       16384    8M Solaris reserved 1


Disk /dev/sdg: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: WDC  WDBNCE0010P
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: XXXX123

Device          Start        End    Sectors   Size Type
/dev/sdg1        2048 1953507327 1953505280 931.5G Solaris /usr & Apple ZFS
/dev/sdg9  1953507328 1953523711      16384     8M Solaris reserved 1


Disk /dev/sdh: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: WDC WD80EMAZ-00W
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXX123

Device           Start         End     Sectors  Size Type
/dev/sdh1         2048 15628036095 15628034048  7.3T Solaris /usr & Apple ZFS
/dev/sdh9  15628036096 15628052479       16384    8M Solaris reserved 1


Disk /dev/sdi: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: ST8000DM004-2CX1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXX123

Device           Start         End     Sectors  Size Type
/dev/sdi1         2048 15628036095 15628034048  7.3T Solaris /usr & Apple ZFS
/dev/sdi9  15628036096 15628052479       16384    8M Solaris reserved 1


Disk /dev/mapper/pve-swap: 8 GiB, 8589934592 bytes, 16777216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/pve-root: 96 GiB, 103079215104 bytes, 201326592 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/pve-vm--100--disk--0: 32 GiB, 34359738368 bytes, 67108864 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 65536 bytes / 65536 bytes
Disklabel type: dos
Disk identifier: 0x97e9a0ea

Device                                 Boot Start      End  Sectors Size Id Type
/dev/mapper/pve-vm--100--disk--0-part1 *     2048 67106815 67104768  32G 83 Linux

感谢您提供的任何信息。

【问题讨论】:

【参考方案1】:

我能够解决该问题并让 RAID 阵列恢复到正常状态,而不会丢失任何数据。我已经通过使用磁盘标识符而不是udev分配的磁盘路径解决了这个问题。

如果其他人在向系统添加或删除不相关的磁盘后遇到 ZFS 池故障并确定他们也使用磁盘路径而不是最初的静态标识符创建了池,我将提供以下步骤。

    备份您的数据;在继续之前,请确保您拥有池数据的完整副本。

    停止对文件系统的所有写入(在这种情况下我使用的是 Docker):

docker stop $(docker ps -aq)
    卸载文件系统
umount /vault
    导出 ZFS 池
zpool export vault
    这次使用磁盘标识符而不是磁盘路径导入 ZFS 池。
zpool import -d /dev/disk/by-id vault
    检查 ZFS 池的状态。游泳池现在应该看起来很健康。
root@homenas:~# zpool status
  pool: vault
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: resilvered 73.5M in 0 days 00:00:07 with 0 errors on Wed Apr 22 10:38:47 2020
config:

        NAME                        STATE     READ WRITE CKSUM
        vault                       ONLINE       0     0     0
          raidz2-0                  ONLINE       0     0     0
            wwn-0xabcdefghijklmnoq  ONLINE       0     0     0
            wwn-0xabcdefghijklmnoq  ONLINE       0     0     0
            wwn-0xabcdefghijklmnoq  ONLINE       0     0     0
            wwn-0xabcdefghijklmnoq  ONLINE       0     0     0
            wwn-0xabcdefghijklmnoq  ONLINE       0     0     0
            wwn-0xabcdefghijklmnoq  ONLINE       0     0     0
            wwn-0xabcdefghijklmnoq  ONLINE       0     0    25
            wwn-0xabcdefghijklmnoq  ONLINE       0     0    16

errors: No known data errors

我们可以看到一些数据在重新导入时被重新同步,但在几秒钟内很快就解决了。

    运行 clear 以从 ZFS 重置标志。
zpool clear vault
    运行 scrub 以确保 ZFS 池处于良好状态。
zpool scrub vault
    检查status 以确保游泳池健康。
root@homenas:~# zpool status
  pool: vault
 state: ONLINE
  scan: scrub repaired 0B in 0 days 01:59:47 with 0 errors on Wed Apr 22 12:46:58 2020
config:

        NAME                        STATE     READ WRITE CKSUM
        vault                       ONLINE       0     0     0
          raidz2-0                  ONLINE       0     0     0
            wwn-0x5000cca252de17d4  ONLINE       0     0     0
            wwn-0x5000c500c4e46bf9  ONLINE       0     0     0
            wwn-0x5000c500c4e65198  ONLINE       0     0     0
            wwn-0x5000c500c4e616a4  ONLINE       0     0     0
            wwn-0x5000c500c4ac129e  ONLINE       0     0     0
            wwn-0x5000c500c4e3f74a  ONLINE       0     0     0
            wwn-0x5000cca257eb9299  ONLINE       0     0     0
            wwn-0x5000c500c4e50efc  ONLINE       0     0     0

errors: No known data errors

【讨论】:

以上是关于从服务器物理移除不相关的硬盘驱动器后,ZFS 报告健康状况下降的主要内容,如果未能解决你的问题,请参考以下文章

Unix下zfs文件系统重组RAID-5后可以这样恢复

Unix下zfs文件系统重组RAID-5恢复方法

7pod应用监控及创建service移除不就绪后端pod

从物理硬盘读取数据

从物理硬盘读取数据

服务器ZFS文件系统故障后的数据恢复过程