Disk Replacement
You may get an alert in the GUI along the lines of
Device: /dev/sda [SAT], 8 Currently unreadable (pending) sectors.
This is one of the predictive failures that Backblaze mentions. You should replace the drive. You may also get an outright failure such as:
Pool pool01 state is DEGRADED: One or more devices has experienced an unrecoverable error.
That’s a drive that has already failed and likewise you must replace it.
Using The GUI
On larger systems you’ll need both the GUI and commandline.
- Find the drive’s serial number and off-line if needed
- Locate the drive and turn on the chassis indicator in the CLI
- Replace the drive
Find The Serial Number and set it Offline.
In the GUI, examine the failing device. If it hasn’t fully faulted yet, use the /dev/sdX identifier in the alert to find it.
Storage -> (pool name) -> Manage Devices -> (Select the disk) -> Disk Info
You’ll see a serial number similar to “Z4F18FFX” and can copy it to your clipboard and offline the disk if it’s not already faulted.
ZFS Info -> Offline
Locate The Disk
Large arrays have lights to help locate the disk. Use the tool sas3ircu for this. If it’s not already present you may need to track it down (see troubleshooting below). To find out, enter the CLI by navigating to:
System -> Shell
Issue this command to search for the location of the disk.
sudo sas3ircu 0 display | grep -B 10 ZC11HPN3
Device is a Hard disk
Enclosure # : 3
Slot # : 5
...
...
Use the same utility to turn on the locate LED.
sudo sas3ircu 0 locate 3:5 ON
Replace The Drive
ASwap the physical device, and back in the GUI select the drive and in the Disk Info menu click “replace”.
Drive -> (Disk Info) -> Replace
The new windows should allow you to pick the ’new’ drive. There should be only one choice. If nothing is listed, you may need to wipe the disk wipefs -a /dev/sdX and repeat.
Don’t forget to turn off the locate LED
sudo sas3ircu 0 locate 3:5 OFF
At The Command Line
It’s ‘strongly advised against’ using the CLI to replace the disk. The GUI takes several steps to prepare the disk and adds a partition to the pool, not the whole disk.
Identify and Off-Line The Disk
Use the gptid from zpool to get the device number, then off-line the disk.
sudo zpool status
raidz3-2 ONLINE 0 0 0
976e8f03-931d-4c9f-873e-048eeef08680 ONLINE 0 0 0
f9384b4f-d94a-43b6-99c4-b8af6702ca42 ONLINE 0 0 0
c5e4f2e5-62f2-41cc-a8de-836ff9683332 ONLINE 0 0 35.4K
sudo zpool offline pool01 c5e4f2e5-62f2-41cc-a8de-836ff9683332
Get The Serial and Blink The Indicator
find /dev/disk -name c5e4f2e5-62f2-41cc-a8de-836ff9683332 -exec ls -lah {} ;
lrwxrwxrwx 1 root root 11 Jan 10 08:41 /dev/disk/by-partuuid/c5e4f2e5-62f2-41cc-a8de-836ff9683332 -> ../../sdae1
sudo smartctl -a /dev/sdae1 | grep Serial
Serial Number: WJG1LNP7
sas3ircu list
sas3ircu 0 display
sas3ircu 0 display | grep -B 10 WJG1LNP7
(Inspect the output for Enclosure and Slot to use in that order below)
sudo sas3ircu 0 locate 3:5 ON
Physically Replace The Drive
This is a physical swap - the indicator will be blinking red. Turn it off when you’re done
sudo sas3ircu 0 locate 3:5 OFF
Logically Replace The Removed Drive
(It’s probably the same device identifier, but you can tail the message log to make sure)
sudo dmesg | tail
sudo zpool replace pool01 c5e4f2e5-62f2-41cc-a8de-836ff9683332 sdae -f
sudo zpool status
raidz3-2 DEGRADED 0 0 0
976e8f03-931d-4c9f-873e-048eeef08680 ONLINE 0 0 0
f9384b4f-d94a-43b6-99c4-b8af6702ca42 ONLINE 0 0 0
replacing-2 DEGRADED 0 0 0
c5e4f2e5-62f2-41cc-a8de-836ff9683332 REMOVED 0 0 0
sdae ONLINE 0 0 0
Hot Spare
If you’re using a hot spare, you may need to detach it after the resilver is finished so status as a spare is returned. Check the spare’s ID at the bottom and then detach it.
zpool status
zpool detach pool01 9d794dfd-2ef6-432d-8252-0c93e79509dc
Troubleshooting
When working at the command line, you may need download the sas3ircu utility from Broadcom.
wget https://docs.broadcom.com/docs-and-downloads/host-bus-adapters/host-bus-adapters-common-files/sas_sata_12g_p16_point_release/SAS3IRCU_P16.zip
If you forgot what light you turned on, you can turn off all slot lights with something like:
for X in {0..23};do echo sas3ircu 0 locate 2:$X OFF;done
for X in {0..11};do sas3ircu 0 locate 3:$X OFF;done
To recreate the GUI process at the command line, as adapted from https://www.truenas.com/community/resources/creating-a-degraded-pool.100/ use these commands. Though gpart and glable are not present on TrueNAS Scale, so the first set are AI informed.
# On TruNAS Scale
sudo sgdisk --zap-all /dev/sdX
sudo sgdisk -o /dev/sdX
sudo sgdisk -n 1:0:0 -t 1:BF00 /dev/sdX
# Find the UUID and replace
ls -l /dev/disk/by-partuuid/ | grep sdX
sudo zpool replace pool01 /dev/sdad /dev/disk/by-partuuid/4c4f943a-569b-4e54-b97d-9c72adf2ed5a
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.