Not backup, it’s simply coping data between multiple locations. More like mirroring.
This is the multi-page printable view of this section. Click here to print.
Sync
- 1: rsync
- 1.1: Basic Rsync
- 1.2: Rsync Daemon
- 1.3: Tunneled Rsync
- 1.4: Rsync Schedule
- 1.5: Rsync Without Login
- 2: Tar Pipe
- 3: Unison
1 - rsync
This is used enough that it deserves several pages.
1.1 - Basic Rsync
If you regularly copy lots of files it’s best to use rsync. It’s efficient, as it only copies what you need, and secure, being able to use SSH. Many other tools such as BackupPC, Duplicity etc. use rsync under the hood, and when you are doing cross-platform data replication it may be the only tool that works, so you’re best to learn it.
Local Copies
Generally, it’s 10% slower than just using cp -a
. Sometimes start with that and finish up with this.
rsync \
--archive \
--delete \
--dry-run \
--human-readable \
--inplace \
--itemize-changes \
--progress \
--verbose \
/some/source/Directory \
/some/destination/
The explanations of the more interesting options are:
--archive: Preserves all the metadata, as you'd expect
--delete : Removes extraneous files at the destination that no longer exist at the source (i.e. _not_ a merge)
--dry-run: Makes no changes. This is important for testing. Remove for the actual run
--inplace: This overwrites the file directly, rather than the default behavior that is to build a copy on the other end before moving it into place. This is slightly faster and better when space is limited (I've read)
If you don’t trust the timestamps at your destination, you can add the --checksum
option, though when you’re local this may be slower than just recopying the whole thing.
A note about trailing slashes: In the source above, there is no trailing slash. But we could have added one, or even a /*
. Here’s what happens when you do that.
- No trailing slash - This will sync the directory as you’d expect.
- Trailing slash - It will sync the contents of the directory to the location, rather than the directory itself.
- Trailing /* - Try not to do this. It will sync each of the items in the source directory as if you had typed them individually. but not delete destination files that no longer exist on source, and so everything will be a merge regardless of if you issued the –delete parameter.
Across the Network
This uses SSH for encryption and authentication.
rsync \
--archive \
--delete \
--dry-run \
--human-readable \
--inplace \
--itemize-changes \
--progress \
--verbose \
/srv/Source_Directory/* \
[email protected]:/srv/Destination_Directory
Windows to Linux
One easy way to do this is to grab a bundled version of rsync and ssh for windows from the cwRsync folks
<https://www.itefix.net/content/cwrsync-free-edition>
Extract the standalone client to a folder and edit the .cmd file to add this at the end (the ^ is the windows CRNL escape)
rsync ^
--archive ^
--delete ^
--dry-run ^
--human-readable ^
--inplace ^
--itemize-changes ^
--no-group ^
--no-owner ^
--progress ^
--verbose ^
--stats ^
[email protected]:/srv/media/video/movies/* /cygdrive/D/Media/Video/Movies/
pause
Mac OS X to Linux
The version that comes with recent versions of OS X is a 2.6.9 (or so) variant. You can use that, or obtain the more recent 3.0.9 that has some slight speed improvements and features. To get the newest (you have to build it yourself) install brew, then issue the commands:
brew install https://raw.github.com/Homebrew/homebrew-dupes/master/rsync.rb
brew install rsync
One of the issues with syncing between OS X and Linux is the handling of Mac resource forks (file metadata). Lets assume that you are only interested in data files (such as mp4) and are leaving out the extended attributes that apple uses to store icons and other assorted data (replacing the old resource fork).
Since we are going between file systems, rather than use the ‘a’ option that preserves file attributes, we specify only ‘recursive’ and ’times’. We also use some excludes keep mac specific files from tagging along.
/usr/local/bin/rsync
--exclude .DS*
--exclude ._*
--human-readable
--inplace
--progress
--recursive
--times
--verbose
--itemize-changes
--dry-run
"/Volumes/3TB/source/"
[email protected]:"/Volumes/3TB/"
Importantly, we are ‘itemizing’ and doing a ‘dry-run’. When you do, you will see a report like:
skipping non-regular file "Photos/Summer.2004"
skipping non-regular file "Photos/Summer.2005"
.d..t....... Documents/
.d..t....... Documents/Work/
cd++++++++++ ISOs/
<f++++++++++ ISOs/Office.ISO
The line with cd+++
indicate a directory will be created and <f+++
indicate a file is going to be copied. When it says ‘skipping’ a non regular file, that’s (in this case, at least) a symlink. You can include them, but if your paths don’t match up on both systems, these links will fail.
Spaces in File Names
Generally you quote and escape.
rsync
--archive ^
--itemize-changes ^
--progress ^
[email protected]:"/srv/media/audio/Music/Basil\ Poledouris" ^
/cygdrive/c/Users/Allen/Music
Though it’s rumored that you can single quote and escape with the –protect-args option
--protect-args ^
[email protected]:'/srv/media/audio/Music/Basil Poledouris' ^
List of Files
You may want to combine find and rsync to get files of a specific criteria. Use the --from-file
parameter
ssh server.gattis.org find /srv/media/video -type f -mtime -360 > list
rsync --progress --files-from=list server.gattis.org:/ /mnt/media/video/
Seeding an Initial Copy
If you have no data on the destination to begin with, rsync will be somewhat slower than a straight copy. On a local system simply use ‘cp -a’ (to preserve file times). On a remote system, you can use tar to minimize the file overhead.
tar -c /path/to/dir | ssh remote_server 'tar -xvf - -C /absolute/path/to/remotedir'
It is also possible to use rsync with the option --whole-file
and this will skip the things that slow rsync down though I have not tested it’s speed
Time versus size
Rsync uses time and size to determine if a file should be updated. If you have already copied files and you are trying to do a sync, you may find your modification times are off. Add the –size-only or the –modify-window=NUM. Even better, correct your times. (this requires on OS X the coreutils to get the GNU ls command and working with the idea here)
http://notemagnet.blogspot.com/2009/10/getting-started-with-rsync-for-paranoid.html http://www.chrissearle.org/blog/technical/mac_homebrew_and_homebrew_alt http://ubuntuforums.org/showthread.php?t=1806213
1.2 - Rsync Daemon
Some low-power devices, such as the Raspbery Pi, struggle with the encryption overheard of rsync default network transport, ssh.
If you don’t need encryption or authentication, you can significantly speed things up by using rsync in daemon mode.
Push Config
In this example, we’ll push data from our server to the low-power client.
Create a Config File
Create a config file on the sever that we’ll send over to the client later.
nano client-rsyncd.conf
log file = /var/log/rsync.log
pid file = /var/run/rsyncd.pid
lock file = /var/run/rsync.lock
# This is the name you refer to in rsync. The path is where that maps to.
[media]
path = /var/media
comment = Media
read only = false
timeout = 300
uid = you
gid = you
Start and Push On-Demand
The default port is hi-level and doesn’t require root privileges.
# Send the daemon config over to the home dir
scp client-rsyncd.conf [email protected]:
# Launch rsync in daemon mode
ssh [email protected]: rsync --daemon --config ./client-rsyncd.conf
# Send the data over
rsync \
--archive \
--delete \
--human-readable \
--inplace \
--itemize-changes \
--no-group \
--no-owner \
--no-perms \
--omit-dir-times \
--progress \
--recursive \
--verbose \
--stats \
/mnt/pool01/media/movies rsync://client.some.lan:8730/media
# Terminate the remote instance
ssh [email protected] killall rsync
1.3 - Tunneled Rsync
One common task is to rsync through a bastion host to an internal system. Do it with the rsync shell options
rsync \
--archive \
--delete \
--delete-excluded \
--exclude "lost+found" \
--human-readable \
--inplace \
--progress \
--rsh='ssh -o "ProxyCommand ssh [email protected] -W %h:%p"' \
--verbose \
[email protected]:/srv/plex/* \
/data/
There is a -J
or ProxyJUmp option on new versions of SSH as well.
https://superuser.com/questions/964244/rsyncing-directories-through-ssh-tunnel https://unix.stackexchange.com/questions/183951/what-do-the-h-and-p-do-in-this-command https://superuser.com/questions/1115715/rsync-files-via-intermediate-host
1.4 - Rsync Schedule
It’s usually best to wrap rsync
and call it from cron
. Preferably with something that doesn’t step on itself for long running syncs, like this:
vi ~/bin/schedule-rsync
#!/bin/bash
THE_USER="remote-account-1"
THE_KEY="remote-account-1"
SCRIPT_NAME=$(basename "$0")
PIDOF=$(pidof -x $SCRIPT_NAME)
for PID in $PIDOF; do
if [ $PID != $$ ]; then
echo "[$(date)] : $SCRIPT_NAME : Process is already running with PID $PID"
exit 1
fi
done
# Change to working directory. Assume running as non-root user per cronfile config below
cd ~/bin
rsync \
--archive \
--bwlimit=5m \
--delete \
--delete-excluded \
--exclude .DS* \
--exclude ._* \
--human-readable \
--inplace \
--itemize-changes \
--no-group \
--no-owner \
--no-perms \
--progress \
--recursive \
--rsh "ssh -i ${THE_KEY}" \
--verbose \
--stats \
${THE_USER}@some.server.org\
:/mnt/pool01/folder.1 \
:/mnt/pool01/folder.2 \
:/mnt/pool01/folder.2 \
/mnt/pool02/
Then, call it from a file in the cron drop folder.
echo "0 1 * * * $USER /home/$USER/schedule-rsync >> /home/$USER/bin/rsync-video.log 2>&1" > /etc/cron.d/schedule-rsync
1.5 - Rsync Without Login
You’d like to use rsync, but ensure users can only use rsync and can’t login with a shell, forward sessions, or other shenanigans. Do this with ssh keys and ForceCommand.
Limit Use With Keys and a Custom Script
# On your server, add a central location for keys
sudo mkdir /etc/ssh/authorized_keys
# Configure SSH to look for user public keys in that spot - the %u is the variable for user ID
echo "AuthorizedKeysFile /etc/ssh/authorized_keys/%u.pub" > /etc/ssh/sshd_config.d/authorized_users.conf
# Create a script that checks incoming ssh commands to make sure they are for rsync
sudo tee /etc/ssh/authorized_keys/checkssh.sh << "EOF"
#!/bin/bash
if [ -n "$SSH_ORIGINAL_COMMAND" ]; then
if [[ "$SSH_ORIGINAL_COMMAND" =~ ^rsync\ ]]; then
echo $SSH_ORIGINAL_COMMAND | systemd-cat -t rsync
exec $SSH_ORIGINAL_COMMAND
else
echo DENIED $SSH_ORIGINAL_COMMAND | systemd-cat -t rsync
fi
fi
EOF
chmod +x /etc/ssh/authorized_keys/checkssh.sh
systemctl restart ssh.service
Now that we have the SSH server configured, let’s create a system user (required, unfortunately) and a key. We’ll limit the account as much as possible, though you can’t use /usr/sbin/nologin shell, as rsync requires something to run in.
THE_USER="remote-account-1"
sudo adduser --no-create-home --home /nonexistent --disabled-password --gecos "" ${THE_USER}
# Its easiest to create the key yourself, but a .pub from them is also fine.
# Send out the private key from the folder (it's the one without the .pub on the end) to the remote system.
ssh-keygen -f /etc/ssh/authorized_keys/${THE_USER} -q -N "" -C "${THE_USER}"
Let’s add a ForcedCommand to the key so that it can only be used with the features we allow.
vi /etc/ssh/authorized_keys/${THE_USER}
# Paste this command="... string in front of the existing key
command="/etc/ssh/authorized_keys/checkssh.sh",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty ssh-rsa AAAAB3NzaC1...
This remote user can now use rsync, but can’t login or do other activities. Their command would look something like this (using the private key you created above)
rsync \
--rsh "ssh -i /where/your/private/key/is/remote-account-1" \
[email protected]:/some/folder /some/local/place/
Notes
Why not use rrsync?
The rrsync
script is similar to the script we use, but is distributed and maintained as part of the rsync package. It’s arguably a better choice. I like the checkssh.sh
approach as it’s more flexible, allows for things other than rsync, and doesn’t force relative paths. But if you’re only doing rsync, consider using rrsync
like this;
# Paste this command="... string in front of the existing key
command="rrsync -ro /some/folder/share",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty ssh-rsa AAAAB3NzaC1...
In your client’s rsync
command, make the paths relative to path rrsync expects above.
rsync [email protected]:folder.1 /destination/folder/
If you see the client-side error message:
rrsync error: option -L has been disabled on this server
You discovered that following symlinks has been disabled by default in rrsync. You can enable with an edit to the script.
sudo sed -i 's/KLk//' /usr/bin/rrsync
# This changes
# short_disabled_subdir = 'KLk'
to
# short_disabled_subdir = ''
Script It
If you need do it more than once, it might look something like this.
#!/bin/bash
HELP_MESSAGE="Usage: $0 <user> \n\nThis script requires a username to be specified.\n"
if [ "$#" -eq 0 ]; then
echo -e "$HELP_MESSAGE"
exit 1
fi
if [ id username &>/dev/null ]; then
echo "User already exists."
exit 1
fi
if [ "$EUID" -ne 0 ]; then
echo "This script must be run with sudo."
exit 1
fi
THE_USER=$1
THE_COMMAND="\
command=\
\"/etc/ssh/authorized_keys/checkssh.sh\",\
no-port-forwarding,\
no-X11-forwarding,\
no-agent-forwarding,\
no-pty "
useradd --home-dir /nonexistent ${THE_USER}
mkdir -p /etc/ssh/authorized_keys
ssh-keygen -f /etc/ssh/authorized_keys/${THE_USER} -q -N "" -C "${THE_USER}"
sed -i "1s|^|$THE_COMMAND|" /etc/ssh/authorized_keys/${THE_USER}.pub
Sources
https://peterbabic.dev/blog/transfer-files-between-servers-using-rrsync/ http://gergap.de/restrict-ssh-to-rsync.html https://superuser.com/questions/641275/make-linux-server-allow-rsync-scp-sftp-but-not-a-terminal-login
2 - Tar Pipe
AKA - The Fastest Way to Copy Files.
When you don’t want to copy a whole file system, many admins suggest the fastest way is with a ’tar pipe'.
Locally
From one disk to another on the same system. This uses pv
to buffer.
(cd /src; tar cpf - .) | pv -trab -B 500M | (cd /dst; tar xpf -)
Across the network
NetCat
You can add netcat to the mix (as long as you don’t need encryption) to get it across the network.
On the receiver:
(change to the directory you want to receive the files or directories in)
nc -l -p 8989 | tar -xpzf -
On the sender:
(change to the directory that has the file or directory - like ‘pics’ - in it)
tar -czf - pics | nc some.server 8989
mbuffer
This takes the place of pc
and nc
and is somewhat faster.
On the receiving side
mbuffer -4 -I 9090 | tar -xf -
On the sending side
sudo tar -c plexmediaserver | mbuffer -m 1G -O SOME.IP:9090
SSH
You can use ssh when netcat isn’t appropriate or you want to automate with a SSH key and limited interaction with the other side. This examples ‘pulls’ from a remote server.
(ssh [email protected] tar -czf - /srv/http/someSite) | (tar -xzf -)
NFS
If you already have a NFS server on one of the systems though, it’s basically just as fast. At least in informal testing, it behaves more steadily as opposed to a tar pipe’s higher peaks and lower troughs. A simple cp -a will suffice though for lots of little files a tar pipe still may be faster.
rsync
rsync is generally best if you can or expect the transfer to be interrupted. In my testing, rsync achieved about 15% less throughput with about 10% more processor overhead.
http://serverfault.com/questions/43014/copying-a-large-directory-tree-locally-cp-or-rsync http://unix.stackexchange.com/questions/66647/faster-alternative-to-cp-a http://serverfault.com/questions/18125/how-to-copy-a-large-number-of-files-quickly-between-two-servers
3 - Unison
Unison offers several features that make it more useful than rsync;
- Multi-Way File Sync
- Detect Renames and Copies
- Delta copies
Multi-Way File Sync
Rsync is good at one-way synchronization. i.e. one to many. But when you need to sync multiple authoritative systems, i.e. many to many, you want to use unison. It allows you to merge changes.
Detect Renames and Copies (xferbycopying)
Another problem with rsync is that when you rename a file, it re-sends it. This is because a re-named file appears ’new’ to the sync utility. Unison however, maintains a hash of every file you’ve synced and if there is already a local copy (i.e. the file before you renamed it), it will use that and do a ’local copy’ rather than sending it. So a rename effectively is a local copy and a delete. Not perfect, but better than sending it across the wire.
Delta Copies
Unison uses it’s own implementation of the rsync delta copy algorithm. However, for large files the authors recommend an option that wraps rsync itself as you can optimize it for large files. Use Unison can use config files in your ~/.unison folder. If you type ‘unison’ without any arguments, it will use the ‘default.prf’ file. Here is a sample
# Unison preferences file
# Here are the two server 'roots' i.e., the start of where we will pick out things to sync.
# The first root is local, and the other remote over ssh
root = /mnt/someFolder
root = ssh://[email protected]//mnt/someFolder
# The 'path' is simply the name of a folder or file you want to sync. Notice the spaces are preserved. No not excape them.
path = A Folder Inside someFolder
# We're 'forcing' the first root to win all conflicts. This sort of negates the multi-way
# sync feature but it's just an example
force = /mnt/someFolder
# This instructs unison to copy the contents of sym links, rather than the link itself
follow = Regex .*
# You can also ignore files and paths explicitly or pattern. See the 'Path specification'
ignore = Name .AppleDouble
ignore = Name .DS_Store
ignore = Name .Parent
ignore = Name ._*
# Here we are invoking an external engine (rsync) when a file is over 10M, and passing it some arguments
copythreshold = 10000
copyprog = rsync --inplace
copyprogrest = rsync --partial --inplace
Hostname is important. Unison builds a hash of all the files to determine what’s changed (similar to md5sum with rsync, but faster). If you get repeated messages about ‘…first time being run…’ you may have an error in your path
http://www.cis.upenn.edu/~bcpierce/unison/download/releases/stable/unison-manual.html