从服务器物理移除不相关的硬盘驱动器后,ZFS 报告健康状况下降
Posted
技术标签:
【中文标题】从服务器物理移除不相关的硬盘驱动器后,ZFS 报告健康状况下降【英文标题】:ZFS reports degraded health after physically removing unrelated hard drive from server 【发布时间】:2020-08-04 22:52:28 【问题描述】:我有一台在 NVMe SSD (ext4) 和 2 个 ZFS 池上运行 Debian 10 (Proxmox) 的家庭服务器。第一个池是一个名为vault
的8x8TB Raid Z2 阵列,另一个是一个名为workspace
的RAID 0 2x1TB 阵列。
我最近想删除workspace
。我停止了对文件系统的所有文件操作,卸载了文件系统,然后继续在池上运行zfs destroy
。我从workspace
物理移除了其中一个驱动器,然后重新启动了计算机。
当我重新启动时,我希望只看到 vault
ZFS 池并且它是健康的。但是,当我检查时,我发现它现在处于 DEGRADED 状态。
root@homenas:~# zpool status
pool: vault
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
scan: scrub in progress since Tue Apr 21 19:03:01 2020
2.35T scanned at 6.50G/s, 336G issued at 930M/s, 6.69T total
0B repaired, 4.91% done, 0 days 01:59:31 to go
config:
NAME STATE READ WRITE CKSUM
vault DEGRADED 0 0 0
raidz2-0 DEGRADED 0 0 0
sda ONLINE 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
sdd ONLINE 0 0 0
sde ONLINE 0 0 0
sdf ONLINE 0 0 0
15380939428578218220 FAULTED 0 0 0 was /dev/sdi1
8563980037536059323 UNAVAIL 0 0 0 was /dev/sdj1
errors: No known data errors
我相信驱动器可能已重新分配到/dev/sdX
的不同路径。我不确定为什么一个出现故障而一个不可用。 vault
ZFS 池仍然处于活动状态,我已经在运行备份以将所有最近的数据汇集到另一个存储介质。
但是,是否可以恢复我的 vault
池并使其恢复健康?如果这是由于删除 workspace
池后某些驱动器切换造成的,那么恢复阵列的最佳选择是什么?
这是我从 fdisk 获得的信息:
root@homenas:~# fdisk -l
Disk /dev/nvme0n1: 465.8 GiB, 500107862016 bytes, 976773168 sectors
Disk model: WDBRPG5000ANC-WRSN
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: XXXX123
Device Start End Sectors Size Type
/dev/nvme0n1p1 34 2047 2014 1007K Bios boot
/dev/nvme0n1p2 2048 1050623 1048576 512M EFI System
/dev/nvme0n1p3 1050624 976773134 975722511 465.3G Linux LVM
Disk /dev/sda: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: WDC WD80EMAZ-00W
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXX123
Device Start End Sectors Size Type
/dev/sda1 2048 15628036095 15628034048 7.3T Solaris /usr & Apple ZFS
/dev/sda9 15628036096 15628052479 16384 8M Solaris reserved 1
Disk /dev/sdb: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: ST8000DM004-2CX1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXX123
Device Start End Sectors Size Type
/dev/sdb1 2048 15628036095 15628034048 7.3T Solaris /usr & Apple ZFS
/dev/sdb9 15628036096 15628052479 16384 8M Solaris reserved 1
Disk /dev/sdc: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: ST8000DM004-2CX1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXX123
Device Start End Sectors Size Type
/dev/sdc1 2048 15628036095 15628034048 7.3T Solaris /usr & Apple ZFS
/dev/sdc9 15628036096 15628052479 16384 8M Solaris reserved 1
Disk /dev/sdd: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: ST8000DM004-2CX1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXX123
Device Start End Sectors Size Type
/dev/sdd1 2048 15628036095 15628034048 7.3T Solaris /usr & Apple ZFS
/dev/sdd9 15628036096 15628052479 16384 8M Solaris reserved 1
Disk /dev/sde: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: ST8000DM004-2CX1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXX123
Device Start End Sectors Size Type
/dev/sde1 2048 15628036095 15628034048 7.3T Solaris /usr & Apple ZFS
/dev/sde9 15628036096 15628052479 16384 8M Solaris reserved 1
Disk /dev/sdf: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: ST8000DM004-2CX1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXX123
Device Start End Sectors Size Type
/dev/sdf1 2048 15628036095 15628034048 7.3T Solaris /usr & Apple ZFS
/dev/sdf9 15628036096 15628052479 16384 8M Solaris reserved 1
Disk /dev/sdg: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Disk model: WDC WDBNCE0010P
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: XXXX123
Device Start End Sectors Size Type
/dev/sdg1 2048 1953507327 1953505280 931.5G Solaris /usr & Apple ZFS
/dev/sdg9 1953507328 1953523711 16384 8M Solaris reserved 1
Disk /dev/sdh: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: WDC WD80EMAZ-00W
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXX123
Device Start End Sectors Size Type
/dev/sdh1 2048 15628036095 15628034048 7.3T Solaris /usr & Apple ZFS
/dev/sdh9 15628036096 15628052479 16384 8M Solaris reserved 1
Disk /dev/sdi: 7.3 TiB, 8001563222016 bytes, 15628053168 sectors
Disk model: ST8000DM004-2CX1
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXX123
Device Start End Sectors Size Type
/dev/sdi1 2048 15628036095 15628034048 7.3T Solaris /usr & Apple ZFS
/dev/sdi9 15628036096 15628052479 16384 8M Solaris reserved 1
Disk /dev/mapper/pve-swap: 8 GiB, 8589934592 bytes, 16777216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/mapper/pve-root: 96 GiB, 103079215104 bytes, 201326592 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk /dev/mapper/pve-vm--100--disk--0: 32 GiB, 34359738368 bytes, 67108864 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 65536 bytes / 65536 bytes
Disklabel type: dos
Disk identifier: 0x97e9a0ea
Device Boot Start End Sectors Size Id Type
/dev/mapper/pve-vm--100--disk--0-part1 * 2048 67106815 67104768 32G 83 Linux
感谢您提供的任何信息。
【问题讨论】:
【参考方案1】:我能够解决该问题并让 RAID 阵列恢复到正常状态,而不会丢失任何数据。我已经通过使用磁盘标识符而不是udev
分配的磁盘路径解决了这个问题。
如果其他人在向系统添加或删除不相关的磁盘后遇到 ZFS 池故障并确定他们也使用磁盘路径而不是最初的静态标识符创建了池,我将提供以下步骤。
备份您的数据;在继续之前,请确保您拥有池数据的完整副本。
停止对文件系统的所有写入(在这种情况下我使用的是 Docker):
docker stop $(docker ps -aq)
-
卸载文件系统
umount /vault
-
导出 ZFS 池
zpool export vault
-
这次使用磁盘标识符而不是磁盘路径导入 ZFS 池。
zpool import -d /dev/disk/by-id vault
-
检查 ZFS 池的状态。游泳池现在应该看起来很健康。
root@homenas:~# zpool status
pool: vault
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-9P
scan: resilvered 73.5M in 0 days 00:00:07 with 0 errors on Wed Apr 22 10:38:47 2020
config:
NAME STATE READ WRITE CKSUM
vault ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
wwn-0xabcdefghijklmnoq ONLINE 0 0 0
wwn-0xabcdefghijklmnoq ONLINE 0 0 0
wwn-0xabcdefghijklmnoq ONLINE 0 0 0
wwn-0xabcdefghijklmnoq ONLINE 0 0 0
wwn-0xabcdefghijklmnoq ONLINE 0 0 0
wwn-0xabcdefghijklmnoq ONLINE 0 0 0
wwn-0xabcdefghijklmnoq ONLINE 0 0 25
wwn-0xabcdefghijklmnoq ONLINE 0 0 16
errors: No known data errors
我们可以看到一些数据在重新导入时被重新同步,但在几秒钟内很快就解决了。
-
运行
clear
以从 ZFS 重置标志。
zpool clear vault
-
运行
scrub
以确保 ZFS 池处于良好状态。
zpool scrub vault
-
检查
status
以确保游泳池健康。
root@homenas:~# zpool status
pool: vault
state: ONLINE
scan: scrub repaired 0B in 0 days 01:59:47 with 0 errors on Wed Apr 22 12:46:58 2020
config:
NAME STATE READ WRITE CKSUM
vault ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
wwn-0x5000cca252de17d4 ONLINE 0 0 0
wwn-0x5000c500c4e46bf9 ONLINE 0 0 0
wwn-0x5000c500c4e65198 ONLINE 0 0 0
wwn-0x5000c500c4e616a4 ONLINE 0 0 0
wwn-0x5000c500c4ac129e ONLINE 0 0 0
wwn-0x5000c500c4e3f74a ONLINE 0 0 0
wwn-0x5000cca257eb9299 ONLINE 0 0 0
wwn-0x5000c500c4e50efc ONLINE 0 0 0
errors: No known data errors
【讨论】:
以上是关于从服务器物理移除不相关的硬盘驱动器后,ZFS 报告健康状况下降的主要内容,如果未能解决你的问题,请参考以下文章