ZFS报告说,从服务器上物理移除不相关的硬盘驱动器后,健康状况下降

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了ZFS报告说,从服务器上物理移除不相关的硬盘驱动器后,健康状况下降相关的知识,希望对你有一定的参考价值。

我有一台家用服务器,在一个 NVMe SSD (ext4) 上运行 Debian 10 (Proxmox) 和 2 个 ZFS 池。第一个池子是一个8x8TB的Raid Z2阵列,叫做 vault 另一个是RAID 0 2x1TB阵列,称为 workspace.

我最近想把 workspace. 我停止了对文件系统的所有文件操作,卸载了文件系统,然后继续执行 zfs destroy 池上。我把其中一个硬盘从 workspace 和我重启了电脑。

当我重新启动时,我以为只看到了 vault ZFS池,并且它是健康的。然而,当我检查时,我看到它现在处于DEGRADED状态。

root@homenas:~# zpool status
  pool: vault
 state: DEGRADED
status: One or more devices could not be used because the label is missing or
        invalid.  Sufficient replicas exist for the pool to continue
        functioning in a degraded state.
action: Replace the device using 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
  scan: scrub in progress since Tue Apr 21 19:03:01 2020
        2.35T scanned at 6.50G/s, 336G issued at 930M/s, 6.69T total
        0B repaired, 4.91% done, 0 days 01:59:31 to go
config:

        NAME                      STATE     READ WRITE CKSUM
        vault                     DEGRADED     0     0     0
          raidz2-0                DEGRADED     0     0     0
            sda                   ONLINE       0     0     0
            sdb                   ONLINE       0     0     0
            sdc                   ONLINE       0     0     0
            sdd                   ONLINE       0     0     0
            sde                   ONLINE       0     0     0
            sdf                   ONLINE       0     0     0
            15380939428578218220  FAULTED      0     0     0  was /dev/sdi1
            8563980037536059323   UNAVAIL      0     0     0  was /dev/sdj1

errors: No known data errors

我相信这些硬盘可能是在 /dev/sdX. 不过我不知道为什么一个出了故障,一个不能用了。该 vault ZFS 池仍然处于活动状态,我已经运行了一个备份,将最近的所有数据池化到另一个存储介质上。

然而,是否可以恢复我的 vault 池,并使其恢复健康?如果这是由于一些驱动器在删除后的切换。workspace 池,那么恢复阵列的最佳选择是什么?

这是我从fdisk得到的信息。

root@homenas:~# fdisk -l
Disk /dev/nvme0n1: 465.8 GiB, 500107862016 bytes, 976773168 sectors
Disk model: WDBRPG5000ANC-WRSN
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: XXXX123

Device           Start       End   Sectors   Size Type
/dev/nvme0n1p1      34      2047      2014  1007K Bios boot
/dev/nvme0n1p2    2048   1050623   1048576   512M EFI System
/dev/nvme0n1p3 1050624 976773134 975722511 465.3G Linux LVM


Disk /dev/sda: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: WDC WD80EMAZ-00W
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXX123

Device           Start         End     Sectors  Size Type
/dev/sda1         2048 15628036095 15628034048  7.3T Solaris /usr & Apple ZFS
/dev/sda9  15628036096 15628052479       16384    8M Solaris reserved 1


Disk /dev/sdb: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: ST8000DM004-2CX1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXX123

Device           Start         End     Sectors  Size Type
/dev/sdb1         2048 15628036095 15628034048  7.3T Solaris /usr & Apple ZFS
/dev/sdb9  15628036096 15628052479       16384    8M Solaris reserved 1


Disk /dev/sdc: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: ST8000DM004-2CX1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXX123

Device           Start         End     Sectors  Size Type
/dev/sdc1         2048 15628036095 15628034048  7.3T Solaris /usr & Apple ZFS
/dev/sdc9  15628036096 15628052479       16384    8M Solaris reserved 1


Disk /dev/sdd: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: ST8000DM004-2CX1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXX123

Device           Start         End     Sectors  Size Type
/dev/sdd1         2048 15628036095 15628034048  7.3T Solaris /usr & Apple ZFS
/dev/sdd9  15628036096 15628052479       16384    8M Solaris reserved 1


Disk /dev/sde: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: ST8000DM004-2CX1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXX123

Device           Start         End     Sectors  Size Type
/dev/sde1         2048 15628036095 15628034048  7.3T Solaris /usr & Apple ZFS
/dev/sde9  15628036096 15628052479       16384    8M Solaris reserved 1


Disk /dev/sdf: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: ST8000DM004-2CX1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXX123

Device           Start         End     Sectors  Size Type
/dev/sdf1         2048 15628036095 15628034048  7.3T Solaris /usr & Apple ZFS
/dev/sdf9  15628036096 15628052479       16384    8M Solaris reserved 1


Disk /dev/sdg: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: WDC  WDBNCE0010P
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: XXXX123

Device          Start        End    Sectors   Size Type
/dev/sdg1        2048 1953507327 1953505280 931.5G Solaris /usr & Apple ZFS
/dev/sdg9  1953507328 1953523711      16384     8M Solaris reserved 1


Disk /dev/sdh: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: WDC WD80EMAZ-00W
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXX123

Device           Start         End     Sectors  Size Type
/dev/sdh1         2048 15628036095 15628034048  7.3T Solaris /usr & Apple ZFS
/dev/sdh9  15628036096 15628052479       16384    8M Solaris reserved 1


Disk /dev/sdi: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: ST8000DM004-2CX1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXX123

Device           Start         End     Sectors  Size Type
/dev/sdi1         2048 15628036095 15628034048  7.3T Solaris /usr & Apple ZFS
/dev/sdi9  15628036096 15628052479       16384    8M Solaris reserved 1


Disk /dev/mapper/pve-swap: 8 GiB, 8589934592 bytes, 16777216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/pve-root: 96 GiB, 103079215104 bytes, 201326592 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes


Disk /dev/mapper/pve-vm--100--disk--0: 32 GiB, 34359738368 bytes, 67108864 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 65536 bytes / 65536 bytes
Disklabel type: dos
Disk identifier: 0x97e9a0ea

Device                                 Boot Start      End  Sectors Size Id Type
/dev/mapper/pve-vm--100--disk--0-part1 *     2048 67106815 67104768  32G 83 Linux

我很感激你能提供的任何信息。

答案

我能够解决这个问题,并使RAID阵列恢复到健康状态,而不会丢失任何数据。我通过使用磁盘标识符来解决这个问题,而不是使用磁盘路径来解决这个问题。udev.

如果还有人在向系统添加或删除一个不相关的磁盘后遇到ZFS池故障,并确定他们也是使用磁盘路径而不是最初的静态标识符创建的池,我将提供以下步骤。

  1. 备份您的数据;在继续之前,确保您有一份完整的池数据副本。

  2. 停止对文件系统的所有写入(我在这种情况下使用的是Docker)。

docker stop $(docker ps -aq)
  1. 卸载文件系统
umount /vault
  1. 导出ZFS池
zpool export vault
  1. 这次使用磁盘标识符而不是磁盘路径导入ZFS池。
zpool import -d /dev/disk/by-id vault
  1. 检查ZFS池的状态。现在池子应该显得很健康。
root@homenas:~# zpool status
  pool: vault
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://zfsonlinux.org/msg/ZFS-8000-9P
  scan: resilvered 73.5M in 0 days 00:00:07 with 0 errors on Wed Apr 22 10:38:47 2020
config:

        NAME                        STATE     READ WRITE CKSUM
        vault                       ONLINE       0     0     0
          raidz2-0                  ONLINE       0     0     0
            wwn-0xabcdefghijklmnoq  ONLINE       0     0     0
            wwn-0xabcdefghijklmnoq  ONLINE       0     0     0
            wwn-0xabcdefghijklmnoq  ONLINE       0     0     0
            wwn-0xabcdefghijklmnoq  ONLINE       0     0     0
            wwn-0xabcdefghijklmnoq  ONLINE       0     0     0
            wwn-0xabcdefghijklmnoq  ONLINE       0     0     0
            wwn-0xabcdefghijklmnoq  ONLINE       0     0    25
            wwn-0xabcdefghijklmnoq  ONLINE       0     0    16

errors: No known data errors

我们可以看到一些数据在重新导入时被重新银化,但很快就被解决了,只需几秒钟。

  1. 运行 clear 来重置ZFS的标志。
zpool clear vault
  1. 运行 scrub 以确保ZFS池处于良好状态。
zpool scrub vault
  1. 检查 status 以确保游泳池是健康的。
root@homenas:~# zpool status
  pool: vault
 state: ONLINE
  scan: scrub repaired 0B in 0 days 01:59:47 with 0 errors on Wed Apr 22 12:46:58 2020
config:

        NAME                        STATE     READ WRITE CKSUM
        vault                       ONLINE       0     0     0
          raidz2-0                  ONLINE       0     0     0
            wwn-0x5000cca252de17d4  ONLINE       0     0     0
            wwn-0x5000c500c4e46bf9  ONLINE       0     0     0
            wwn-0x5000c500c4e65198  ONLINE       0     0     0
            wwn-0x5000c500c4e616a4  ONLINE       0     0     0
            wwn-0x5000c500c4ac129e  ONLINE       0     0     0
            wwn-0x5000c500c4e3f74a  ONLINE       0     0     0
            wwn-0x5000cca257eb9299  ONLINE       0     0     0
            wwn-0x5000c500c4e50efc  ONLINE       0     0     0

errors: No known data errors

以上是关于ZFS报告说,从服务器上物理移除不相关的硬盘驱动器后,健康状况下降的主要内容,如果未能解决你的问题,请参考以下文章

Unix下zfs文件系统重组RAID-5恢复方法

Unix下zfs文件系统重组RAID-5后可以这样恢复

从物理硬盘读取数据

从物理硬盘读取数据

在核心动画中移除或隐藏图层时移除不透明度和时间间隔

proxmox ve ZFS备份及恢复/根分区2019-04-16