Pool Testing
Best Practices
Best practice from Oracle says a VDev should be less than 9 disks1. So given 24 disks, you should have 3 VDevs. They further recommend the following amount of parity vs data:
- single-parity starting at 3 disks (2+1)
- double-parity starting at 6 disks (4+2)
- triple-parity starting at 9 disks (6+3)
It is not recommended to create a zpool with a single large vdev, say 20 disks, because write IOPS performance will be that of a single disk, which also means that resilver time will be very long (possibly weeks with future large drives).
Reasons For These Practices
I interpret this as meaning that when a single IO write operation is given to the VDev, it won’t write anything else until it’s done. But if you have multiple VDevs, you can hand out a writes to other VDevs while you’re waiting on the first. Reading is probably unaffected, but writes will be faster with more VDevs.
Also, when resilvering the array, you have to read from each of the drives in the VDev to calculate the parity bit. If you have 24 drives in a VDev, then you have to read a block of data from all 24 drives to produce the parity bit. If you have only 8, then you have only 1/3 as much data to read. Meanwhile, the rest of the VDevs are available for real work.
Rebuilding the array also introduces stress which can cause other disks to fail, so it’s best to limit that to a smaller set of drives. I’ve heard many times of resilvering causing sister drives that were already on the edge, to go over and fail the array.
Calculating Failure Rates
You can calculate the failure rates of different configurations with an on-line tool2. The chart scales the X axis by 50, so the differences in failure rates are not as large as it would seem, but if they didn’t you wouldn’t be able to see the lines. But in most cases, there’s not a large difference say between a 4x9 and a 3x12.
When To Use a Hot Spare
Given 9 disks where one fails, is it better to drop from 3 parity to 2 and run in degraded mode, or 2 parity that drops to 1 and a spare that recovers without intervention. The math2 says its better to have parity. But what about speed? When you loose a disk, 1 out of every 9 IOPS requires that you reconstruct it from parity. Anecdotally, observed performance penalties are minor. So the only times to use a hot spare is:
- When you have unused capacity in RAIDZ3 (i.e. almost never)
- When IOPS require a mirror pool
Say you have 16 bays of 4TB Drives. A 2x8 Z2 config gives you 48TB but you only want 32TB. Change that to a 2x8 Z3 and get 40TB. Still only need 32 TB? Change that to a 2x7 Z3 with 2 hot spares. Now you have 32TB with the maximum protection and the insurance of an automatic replacement.
Or maybe you have a 37 bay system. You do something that equals 36 plus a spare.
The other case is when your IOPS demands push past what RAIDZ can do an you must use a mirror pool. A failure there looses all redundancy and a hot spare is your only option.
When To Use a Distributed Spare
A distributed spare recovers in half the time3 from a disk loss, and is always better than a dedicated spare - though you should almost never use a spare anyway. The only time to use a normal hot spare is when you have a single global spare.
Testing Speed
The speed difference isn’t charted. So let’s test that some.
Given 24 disks, and deciding to live dangerously, should you should have a single, 24 disk vdev with three parity disks, or three VDevs with a single parity disk each? The reason for the 1st case is better resiliency, and the latter better write speed and recovery from disk failures.
Build a 3-Wide RAIDZ1
Create the pool across 24 disks
zpool create \
-f -m /srv srv \
raidz sdb sdc sdd sde sdf sdg sdh sdi \
raidz sdj sdk sdl sdm sdn sdo sdp sdq \
raidz sdr sds sdt sdu sdv sdw sdx sdy
Now copy a lot of random data to it
#!/bin/bash
no_of_files=1000
counter=0
while [[ $counter -le $no_of_files ]]
do echo Creating file no $counter
touch random-file.$counter
shred -n 1 -s 1G random-file.$counter
let "counter += 1"
done
Now yank (literally) one of the physical disks and replace it
allen@server:~$ sudo zpool status
pool: srv
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
scan: none requested
config:
NAME STATE READ WRITE CKSUM
srv DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
sdb ONLINE 0 0 0
6847353731192779603 UNAVAIL 0 0 0 was /dev/sdc1
sdd ONLINE 0 0 0
sde ONLINE 0 0 0
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
sdh ONLINE 0 0 0
sdi ONLINE 0 0 0
raidz1-1 ONLINE 0 0 0
sdr ONLINE 0 0 0
sds ONLINE 0 0 0
sdt ONLINE 0 0 0
sdu ONLINE 0 0 0
sdj ONLINE 0 0 0
sdk ONLINE 0 0 0
sdl ONLINE 0 0 0
sdm ONLINE 0 0 0
raidz1-2 ONLINE 0 0 0
sdv ONLINE 0 0 0
sdw ONLINE 0 0 0
sdx ONLINE 0 0 0
sdy ONLINE 0 0 0
sdn ONLINE 0 0 0
sdo ONLINE 0 0 0
sdp ONLINE 0 0 0
sdq ONLINE 0 0 0
errors: No known data errors
allen@server:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 465.8G 0 disk
├─sda1 8:1 0 449.9G 0 part /
├─sda2 8:2 0 1K 0 part
└─sda5 8:5 0 15.9G 0 part [SWAP]
sdb 8:16 1 931.5G 0 disk
├─sdb1 8:17 1 931.5G 0 part
└─sdb9 8:25 1 8M 0 part
sdc 8:32 1 931.5G 0 disk
sdd 8:48 1 931.5G 0 disk
├─sdd1 8:49 1 931.5G 0 part
└─sdd9 8:57 1 8M 0 part
...
sudo zpool replace srv 6847353731192779603 /dev/sdc -f
allen@server:~$ sudo zpool status
pool: srv
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Fri Mar 22 15:50:21 2019
131G scanned out of 13.5T at 941M/s, 4h7m to go
5.40G resilvered, 0.95% done
config:
NAME STATE READ WRITE CKSUM
srv DEGRADED 0 0 0
raidz1-0 DEGRADED 0 0 0
sdb ONLINE 0 0 0
replacing-1 OFFLINE 0 0 0
6847353731192779603 OFFLINE 0 0 0 was /dev/sdc1/old
sdc ONLINE 0 0 0 (resilvering)
sdd ONLINE 0 0 0
sde ONLINE 0 0 0
sdf ONLINE 0 0 0
...
A few hours later…
$ sudo zpool status
pool: srv
state: DEGRADED
status: One or more devices has experienced an error resulting in data corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the entire pool from backup.
see: http://zfsonlinux.org/msg/ZFS-8000-8A
scan: resilvered 571G in 5h16m with 2946 errors on Fri Mar 22 21:06:48 2019
config:
NAME STATE READ WRITE CKSUM
srv DEGRADED 208 0 2.67K
raidz1-0 DEGRADED 208 0 5.16K
sdb ONLINE 0 0 0
replacing-1 OFFLINE 0 0 0
6847353731192779603 OFFLINE 0 0 0 was /dev/sdc1/old
sdc ONLINE 0 0 0
sdd ONLINE 208 0 1
sde ONLINE 0 0 0
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
sdh ONLINE 0 0 0
sdi ONLINE 0 0 0
raidz1-1 ONLINE 0 0 0
sdr ONLINE 0 0 0
sds ONLINE 0 0 0
sdt ONLINE 0 0 0
sdu ONLINE 0 0 0
sdj ONLINE 0 0 1
sdk ONLINE 0 0 1
sdl ONLINE 0 0 0
sdm ONLINE 0 0 0
raidz1-2 ONLINE 0 0 0
sdv ONLINE 0 0 0
sdw ONLINE 0 0 0
sdx ONLINE 0 0 0
sdy ONLINE 0 0 0
sdn ONLINE 0 0 0
sdo ONLINE 0 0 0
sdp ONLINE 0 0 0
sdq ONLINE 0 0 0
The time was 5h16m. But notice the error - during resilvering drive sdd
had 208 read errors and data was lost. This is the classic RAID situation where resilvering stresses the drives, another goes bad and you can’t restore.
It’s somewhat questionable if this is a valid test as the affect of the error on resilvering duration is unknown. But on with the test.
Let’s wipe that away and create a raidz3
sudo zpool destroy srv
zpool create \
-f -m /srv srv \
raidz3 \
sdb sdc sdd sde sdf sdg sdh sdi \
sdj sdk sdl sdm sdn sdo sdp sdq \
sdr sds sdt sdu sdv sdw sdx sdy
zdb
zpool offline srv 15700807100581040709
sudo zpool replace srv 15700807100581040709 sdc
allen@server:~$ sudo zpool status
pool: srv
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Sun Mar 24 10:07:18 2019
27.9G scanned out of 9.14T at 362M/s, 7h19m to go
1.21G resilvered, 0.30% done
config:
NAME STATE READ WRITE CKSUM
srv DEGRADED 0 0 0
raidz3-0 DEGRADED 0 0 0
sdb ONLINE 0 0 0
replacing-1 OFFLINE 0 0 0
sdd OFFLINE 0 0 0
sdc ONLINE 0 0 0 (resilvering)
sde ONLINE 0 0 0
sdf ONLINE 0 0 0
...
allen@server:~$ sudo zpool status
pool: srv
state: ONLINE
scan: resilvered 405G in 6h58m with 0 errors on Sun Mar 24 17:05:50 2019
config:
NAME STATE READ WRITE CKSUM
srv ONLINE 0 0 0
raidz3-0 ONLINE 0 0 0
sdb ONLINE 0 0 0
sdc ONLINE 0 0 0
sde ONLINE 0 0 0
sdf ONLINE 0 0 0
sdg ONLINE 0 0 0
...
The time? 6h58m. Longer, but safer.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.