Replication

Replication is how you backup and copy ZFS. It turns a snapshot into a bit-stream that you can pipe to something else. Usually, you pipe it over the network to another system where you connect it to zfs receive.

It is also the only way. The snapshot is what allows point-in-time handling and the receive ensures consistency. And a snapshot is a filesystem, but it’s more than just the files. Two identically named filesystems with the same files you put in place by rsync are not the same filesystem and you can’t jump-start a sync this way.

Basic Examples

# On the receiving side, create a pool to hold the filesystem
zpool create -m none pool02 raidz1 sdc sdd sde sdf

# On the sending side, pipe over SSH. The -F forces the filesystem on the receiving side to be replaced
zfs snapshot pool01@snap1
zfs send pool01@snap1 | ssh some.other.server zfs receive -F pool02

Importantly, this replaces the root filesystem on the receiving side. The filesystem you just copied over is accessible when the replication is finished - assuming it’s mounted and your only using the default root. If you’re using multiple filesystems, you’ll want to recursively send things so you can pick up children like the archive filesystem.

# The -r and -R trigger recursive operations
zfs snapshot -r pool01@snap1
zfs send -R pool01@snap1 | ssh some.other.server zfs receive -F pool02

You can also pick a specific filesystem to send. You can name it whatever you like on the other side, or replace something already named.

# Sending just the archive filesystem
zfs snapshot pool01/archive@snap1
zfs send pool01/archive@snap1 | ssh some.other.server zfs receive -F pool02/archive

And of course, you may have two pools on the same system. One line in a terminal is all you need.

zfs send -R pool01@snap1 | zfs receive -F pool02

Using Mbuffer or Netcat

These are much faster than ssh if you don’t care about someone capturing the traffic. But it does require you to start both ends separately.

# On the receiving side
ssh some.other.system
mbuffer -4 -s 128k -m 1G -I 8990 | zfs receive -F pool02/archive

# On the sending side
zfs send pool01/archive@snap1 | mbuffer -s 128k -m 1G -O some.other.system:8990

You can also use good-ole netcat. It’s a little slower but still faster than SSH. Combine it with pv for some visuals.

# On the receiving end
nc -l 8989 | pv -trab -B 500M | zfs recv -F pool02/archive

# On the sending side
zfs send pool01/archive@snap1 | nc some.other.system 8989

Estimating The Size

You may want to know how big the transmit is to estimate time or capacity. You can do this with a dry-run.

zfs send -nv pool01/archive@snap1

Use a Resumable Token

Any interruptions and you have to start all over again - or do you? If you’re sending a long-running transfer, add a token on the receiving side and you can restart from where it broke, turning a tragedy into just an annoyance.

# On the receiving side, add -s
ssh some.other.system
mbuffer -4 -s 128k -m 1G -I 8990 | zfs receive -s -F pool01/archive

# Send the stream normally
zfs send pool01/archive@snap1 | mbuffer -s 128k -m 1G -O some.other.system:8990

# If you get interrupted, on the receiving side, look up the token
zfs get -H -o value receive_resume_token pool01

# Then use that on the sending side to resume where you left off
zfs send -t 'some_long_key' | mbuffer -s 128k -m 1G -O some.other.system:8990

If you decide you don’t want to resume, clean up with the -A command to release the space consumed by the pending transfer.

# On the receiving side
zfs recv -A pool01/archive

Sending an Incremental Snapshot

After you’ve sent the initial snapshot, subsequent ones are much smaller. Even very large backups can be kept current up if you ‘pre-seed’ before taking the other pool remote.

# at some point in past
zfs snapshot pool01/archive@snap1
zfs send pool01/archive@snap1 ....

# now we'll snap again and send just the changes between the two using  -i 
zfs snapshot pool01/archive@snap2
zfs send -i pool01/archive/@snap1 pool01/archive@snap2 ...

Sending Intervening Snapshots

If you are jumping more than one snapshot ahead, the intervening ones are skipped. If you want to include them for some reason, use the -I option.

# This will send all the snaps between 1 and 9
zfs send -I pool01/archive/@snap1 pool01/archive@snap9 ...

Changes Are Always Detected

You’ll often need to use -F to force changes as even though you haven’t used the remote system, it may think you have if it’s mounted and atimes are on.

You must have a snapshot in common

You need at least 1 snapshot in common at both locations. This must have been sent from one to the other, not just named the same. Say you create snap1 and send it. Later, you create snap2 and, thinking you don’t need it anymore, delete snap1. Even though you have snap1 on the destination you cannot send snap2 as a delta. You need snap1 in both locations to generate the delta. You are out of luck and must send snap2 as a full snapshot.

You can use a zfs feature called a bookmark as an alternative, but that is something you set up in advance and won’t save you from the above scenario.

A Full Example of an Incremental Send

Take a snapshot, estimate the size with a dry-run, then use a resumable token and force changes

zfs snapshot pool01/archive@12th
zfs send -nv -i pool01/archive@11th pool01/archive@12th
zfs send -i pool01/archive@11th pool01/archive@12th | pv -trab -B 500M | ssh some.other.server zfs recv -F -s pool01/archive

Here’s an example of a recursive snapshot to a file. The snapshot takes a -r for recursive, and the send a -R.

# Take the snapshot
zfs snapshot -r pool01/archive@21

# Estimate the size
zfs send -nv -R -i pool01/archive@20 pool01/archive@21

# Mount a portable drive to /mnt - a traditional ext4 or whatever
zfs send -Ri pool01/archive@20 pool01/archive@21 > /mnt/backupfile

# When you get to the other location, receive from the file
zfs receive -s -F pool02/archive < /mnt/backupfile

Sending The Whole Pool

This is a common thought when you’re starting, but it ends up being deceptive because pools aren’t things that can be sent. Only filesets. So what you’re actually doing is a recursive send of the root fileset with a implicit snapshot created on the fly. This is find, but you won’t be able to refer to it later for updates, so you’re better off not.

# Unmount -a (all available filesystems) on the given pool
zfs unmount -a pool01

# Send the unmounted filesystem with an implicit snapshot
zfs send -R pool01 ...

Auto Replication

You don’t want to do this by hand all the time. One way is with a simple script. If you’ve already installed zfs-auto-snapshot you may have something that looks like this:

# use '-o name' to get just the snapshot name without all the details 
# use '-s creation' to sort by creation time
zfs list -t snapshot -o name -s creation pool01/archive

pool01/archive@auto-2024-10-13_00-00
pool01/archive@auto-2024-10-20_00-00
pool01/archive@auto-2024-10-27_00-00
pool01/archive@auto-2024-11-01_00-00
pool01/archive@auto-2024-11-02_00-00

You can get the last two like this, then use send and receive. Adjust the grep to get just the daily as needed.

CURRENT=$(zfs list -t snapshot -o name -s creation pool01/replication | grep auto | tail -1 )
LAST=$(zfs list -t snapshot -o name -s creation pool01/replication | grep auto | tail -2 | head -1)

zfs send -i $LAST $CURRENT | pv -trab -B 500M | ssh some.other.server zfs recv -F -s pool01/archive

This is pretty basic and can fall out of sync. You can bring it up a notch by asking the other side to list it’s snapshots with a zfs list over ssh and comparing against yours to find the most recent match. And add a resume token. But by that point you may consider just using a tool like:

This looks to replace your auto snapshots as well, and that’s probably fine. I have’t used it myself as I scripted back in the day. I will probably start, though.

Next Step

Sometimes, good disks go bad. Learn how to catch them before they do, and replace them when needed.

Scrub and Replacement


Last modified February 18, 2025: Site restructure (2b4b418)