Pool Testing

Best Practices

Best practice from Oracle says a VDev should be less than 9 disks1. So given 24 disks, you should have 3 VDevs. They further recommend the following amount of parity vs data:

  • single-parity starting at 3 disks (2+1)
  • double-parity starting at 6 disks (4+2)
  • triple-parity starting at 9 disks (6+3)

It is not recommended to create a zpool with a single large vdev, say 20 disks, because write IOPS performance will be that of a single disk, which also means that resilver time will be very long (possibly weeks with future large drives).

Reasons For These Practices

I interpret this as meaning that when a single IO write operation is given to the VDev, it won’t write anything else until it’s done. But if you have multiple VDevs, you can hand out a writes to other VDevs while you’re waiting on the first. Reading is probably unaffected, but writes will be faster with more VDevs.

Also, when resilvering the array, you have to read from each of the drives in the VDev to calculate the parity bit. If you have 24 drives in a VDev, then you have to read a block of data from all 24 drives to produce the parity bit. If you have only 8, then you have only 1/3 as much data to read. Meanwhile, the rest of the VDevs are available for real work.

Rebuilding the array also introduces stress which can cause other disks to fail, so it’s best to limit that to a smaller set of drives. I’ve heard many times of resilvering causing sister drives that were already on the edge, to go over and fail the array.

Calculating Failure Rates

You can calculate the failure rates of different configurations with an on-line tool2. The chart scales the X axis by 50, so the differences in failure rates are not as large as it would seem, but if they didn’t you wouldn’t be able to see the lines. But in most cases, there’s not a large difference say between a 4x9 and a 3x12.

When To Use a Hot Spare

Given 9 disks where one fails, is it better to drop from 3 parity to 2 and run in degraded mode, or 2 parity that drops to 1 and a spare that recovers without intervention. The math2 says its better to have parity. But what about speed? When you loose a disk, 1 out of every 9 IOPS requires that you reconstruct it from parity. Anecdotally, observed performance penalties are minor. So the only times to use a hot spare is:

  • When you have unused capacity in RAIDZ3 (i.e. almost never)
  • When IOPS require a mirror pool

Say you have 16 bays of 4TB Drives. A 2x8 Z2 config gives you 48TB but you only want 32TB. Change that to a 2x8 Z3 and get 40TB. Still only need 32 TB? Change that to a 2x7 Z3 with 2 hot spares. Now you have 32TB with the maximum protection and the insurance of an automatic replacement.

Or maybe you have a 37 bay system. You do something that equals 36 plus a spare.

The other case is when your IOPS demands push past what RAIDZ can do an you must use a mirror pool. A failure there looses all redundancy and a hot spare is your only option.

When To Use a Distributed Spare

A distributed spare recovers in half the time3 from a disk loss, and is always better than a dedicated spare - though you should almost never use a spare anyway. The only time to use a normal hot spare is when you have a single global spare.

Testing Speed

The speed difference isn’t charted. So let’s test that some.

Given 24 disks, and deciding to live dangerously, should you should have a single, 24 disk vdev with three parity disks, or three VDevs with a single parity disk each? The reason for the 1st case is better resiliency, and the latter better write speed and recovery from disk failures.

Build a 3-Wide RAIDZ1

Create the pool across 24 disks

zpool create \
-f -m /srv srv \
raidz sdb sdc sdd sde sdf sdg sdh sdi \
raidz sdj sdk sdl sdm sdn sdo sdp sdq \
raidz sdr sds sdt sdu sdv sdw sdx sdy 

Now copy a lot of random data to it

#!/bin/bash

no_of_files=1000
counter=0
while [[ $counter -le $no_of_files ]]
 do echo Creating file no $counter
   touch random-file.$counter
   shred -n 1 -s 1G random-file.$counter
   let "counter += 1"
 done

Now yank (literally) one of the physical disks and replace it

allen@server:~$ sudo zpool status  
                                                                             
  pool: srv                                                                                              
 state: DEGRADED                                                                                         
status: One or more devices could not be used because the label is missing or                            
        invalid.  Sufficient replicas exist for the pool to continue                                     
        functioning in a degraded state.                                                                 
action: Replace the device using 'zpool replace'.                                                        
   see: http://zfsonlinux.org/msg/ZFS-8000-4J                                                            
  scan: none requested                                                                                   
config:                                                                                                  
                                                                                                         
        NAME                     STATE     READ WRITE CKSUM                                              
        srv                      DEGRADED     0     0     0                                              
          raidz1-0               DEGRADED     0     0     0                                              
            sdb                  ONLINE       0     0     0                                              
            6847353731192779603  UNAVAIL      0     0     0  was /dev/sdc1                               
            sdd                  ONLINE       0     0     0                                              
            sde                  ONLINE       0     0     0                                              
            sdf                  ONLINE       0     0     0                                              
            sdg                  ONLINE       0     0     0                                              
            sdh                  ONLINE       0     0     0                                              
            sdi                  ONLINE       0     0     0                                              
          raidz1-1               ONLINE       0     0     0                                              
            sdr                  ONLINE       0     0     0                                              
            sds                  ONLINE       0     0     0                                              
            sdt                  ONLINE       0     0     0                                              
            sdu                  ONLINE       0     0     0                                              
            sdj                  ONLINE       0     0     0                                              
            sdk                  ONLINE       0     0     0                                              
            sdl                  ONLINE       0     0     0                                              
            sdm                  ONLINE       0     0     0  
          raidz1-2               ONLINE       0     0     0  
            sdv                  ONLINE       0     0     0  
            sdw                  ONLINE       0     0     0  
            sdx                  ONLINE       0     0     0  
            sdy                  ONLINE       0     0     0  
            sdn                  ONLINE       0     0     0  
            sdo                  ONLINE       0     0     0  
            sdp                  ONLINE       0     0     0  
            sdq                  ONLINE       0     0     0             
                                                                           
errors: No known data errors

allen@server:~$ lsblk                                                                                    
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT                                                              
sda      8:0    0 465.8G  0 disk                                                                         
├─sda1   8:1    0 449.9G  0 part /                                                                       
├─sda2   8:2    0     1K  0 part                                                                         
└─sda5   8:5    0  15.9G  0 part [SWAP]                                                                  
sdb      8:16   1 931.5G  0 disk                                                                         
├─sdb1   8:17   1 931.5G  0 part                                                                         
└─sdb9   8:25   1     8M  0 part                                                                         
sdc      8:32   1 931.5G  0 disk                                                                         
sdd      8:48   1 931.5G  0 disk                                                                         
├─sdd1   8:49   1 931.5G  0 part                                                                         
└─sdd9   8:57   1     8M  0 part    
...


sudo zpool replace srv 6847353731192779603 /dev/sdc -f

allen@server:~$ sudo zpool status
  pool: srv
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Fri Mar 22 15:50:21 2019
    131G scanned out of 13.5T at 941M/s, 4h7m to go
    5.40G resilvered, 0.95% done
config:

        NAME                       STATE     READ WRITE CKSUM
        srv                        DEGRADED     0     0     0
          raidz1-0                 DEGRADED     0     0     0
            sdb                    ONLINE       0     0     0
            replacing-1            OFFLINE      0     0     0
              6847353731192779603  OFFLINE      0     0     0  was /dev/sdc1/old
              sdc                  ONLINE       0     0     0  (resilvering)
            sdd                    ONLINE       0     0     0
            sde                    ONLINE       0     0     0
            sdf                    ONLINE       0     0     0
...

A few hours later…

$ sudo zpool status


  pool: srv
 state: DEGRADED
status: One or more devices has experienced an error resulting in data corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: resilvered 571G in 5h16m with 2946 errors on Fri Mar 22 21:06:48 2019
config:

    NAME                       STATE     READ WRITE CKSUM
    srv                        DEGRADED   208     0 2.67K
        raidz1-0               DEGRADED   208     0 5.16K
        sdb                    ONLINE       0     0     0
        replacing-1            OFFLINE      0     0     0
          6847353731192779603  OFFLINE      0     0     0  was /dev/sdc1/old
        sdc                    ONLINE       0     0     0
        sdd                    ONLINE     208     0     1
        sde                    ONLINE       0     0     0
        sdf                    ONLINE       0     0     0
        sdg                    ONLINE       0     0     0
        sdh                    ONLINE       0     0     0
        sdi                    ONLINE       0     0     0
      raidz1-1                 ONLINE       0     0     0
        sdr                    ONLINE       0     0     0
        sds                    ONLINE       0     0     0
        sdt                    ONLINE       0     0     0
        sdu                    ONLINE       0     0     0
        sdj                    ONLINE       0     0     1
        sdk                    ONLINE       0     0     1
        sdl                    ONLINE       0     0     0
        sdm                    ONLINE       0     0     0
      raidz1-2                 ONLINE       0     0     0
        sdv                    ONLINE       0     0     0
        sdw                    ONLINE       0     0     0
        sdx                    ONLINE       0     0     0
        sdy                    ONLINE       0     0     0
        sdn                    ONLINE       0     0     0
        sdo                    ONLINE       0     0     0
        sdp                    ONLINE       0     0     0
        sdq                    ONLINE       0     0     0

The time was 5h16m. But notice the error - during resilvering drive sdd had 208 read errors and data was lost. This is the classic RAID situation where resilvering stresses the drives, another goes bad and you can’t restore.

It’s somewhat questionable if this is a valid test as the affect of the error on resilvering duration is unknown. But on with the test.

Let’s wipe that away and create a raidz3

sudo zpool destroy srv


zpool create \
-f -m /srv srv \
raidz3 \
 sdb sdc sdd sde sdf sdg sdh sdi \
 sdj sdk sdl sdm sdn sdo sdp sdq \
 sdr sds sdt sdu sdv sdw sdx sdy



zdb
zpool offline srv 15700807100581040709
sudo zpool replace srv 15700807100581040709 sdc




allen@server:~$ sudo zpool status
  pool: srv
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sun Mar 24 10:07:18 2019
    27.9G scanned out of 9.14T at 362M/s, 7h19m to go
    1.21G resilvered, 0.30% done
config:

    NAME             STATE     READ WRITE CKSUM
    srv              DEGRADED     0     0     0
      raidz3-0       DEGRADED     0     0     0
        sdb          ONLINE       0     0     0
        replacing-1  OFFLINE      0     0     0
          sdd        OFFLINE      0     0     0
          sdc        ONLINE       0     0     0  (resilvering)
        sde          ONLINE       0     0     0
        sdf          ONLINE       0     0     0
        ...


allen@server:~$ sudo zpool status

  pool: srv
 state: ONLINE
  scan: resilvered 405G in 6h58m with 0 errors on Sun Mar 24 17:05:50 2019
config:

    NAME        STATE     READ WRITE CKSUM
    srv         ONLINE       0     0     0
      raidz3-0  ONLINE       0     0     0
        sdb     ONLINE       0     0     0
        sdc     ONLINE       0     0     0
        sde     ONLINE       0     0     0
        sdf     ONLINE       0     0     0
        sdg     ONLINE       0     0     0
        ...

The time? 6h58m. Longer, but safer.


Last modified February 18, 2025: Site restructure (2b4b418)