This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

rsync

This is used enough that it deserves several pages.

1 - Basic Rsync

If you regularly copy lots of files it’s best to use rsync. It’s efficient, as it only copies what you need, and secure, being able to use SSH. Many other tools such as BackupPC, Duplicity etc. use rsync under the hood, and when you are doing cross-platform data replication it may be the only tool that works, so you’re best to learn it.

Local Copies

Generally, it’s 10% slower than just using cp -a. Sometimes start with that and finish up with this.

rsync \
--archive \
--delete \
--dry-run \
--human-readable \
--inplace \
--itemize-changes \
--progress \
--verbose \
/some/source/Directory \
/some/destination/

The explanations of the more interesting options are:

--archive: Preserves all the metadata, as you'd expect
--delete : Removes extraneous files at the destination that no longer exist at the source (i.e. _not_ a merge)
--dry-run: Makes no changes. This is important for testing. Remove for the actual run
--inplace: This overwrites the file directly, rather than the default behavior that is to build a copy on the other end before moving it into place. This is slightly faster and better when space is limited (I've read)

If you don’t trust the timestamps at your destination, you can add the --checksum option, though when you’re local this may be slower than just recopying the whole thing.

A note about trailing slashes: In the source above, there is no trailing slash. But we could have added one, or even a /*. Here’s what happens when you do that.

  • No trailing slash - This will sync the directory as you’d expect.
  • Trailing slash - It will sync the contents of the directory to the location, rather than the directory itself.
  • Trailing /* - Try not to do this. It will sync each of the items in the source directory as if you had typed them individually. but not delete destination files that no longer exist on source, and so everything will be a merge regardless of if you issued the –delete parameter.

Across the Network

This uses SSH for encryption and authentication.

rsync \
--archive \
--delete \
--dry-run \
--human-readable \
--inplace \
--itemize-changes \
--progress \
--verbose \
/srv/Source_Directory/* \
[email protected]:/srv/Destination_Directory

Windows to Linux

One easy way to do this is to grab a bundled version of rsync and ssh for windows from the cwRsync folks

<https://www.itefix.net/content/cwrsync-free-edition>

Extract the standalone client to a folder and edit the .cmd file to add this at the end (the ^ is the windows CRNL escape)

rsync ^
--archive ^
--delete ^
--dry-run ^
--human-readable ^
--inplace ^
--itemize-changes ^
--no-group ^
--no-owner ^
--progress ^
--verbose ^
--stats ^
[email protected]:/srv/media/video/movies/* /cygdrive/D/Media/Video/Movies/

pause

Mac OS X to Linux

The version that comes with recent versions of OS X is a 2.6.9 (or so) variant. You can use that, or obtain the more recent 3.0.9 that has some slight speed improvements and features. To get the newest (you have to build it yourself) install brew, then issue the commands:

brew install https://raw.github.com/Homebrew/homebrew-dupes/master/rsync.rb
brew install rsync

One of the issues with syncing between OS X and Linux is the handling of Mac resource forks (file metadata). Lets assume that you are only interested in data files (such as mp4) and are leaving out the extended attributes that apple uses to store icons and other assorted data (replacing the old resource fork).

Since we are going between file systems, rather than use the ‘a’ option that preserves file attributes, we specify only ‘recursive’ and ’times’. We also use some excludes keep mac specific files from tagging along.

/usr/local/bin/rsync 
    --exclude .DS*
    --exclude ._*        
    --human-readable 
    --inplace 
    --progress 
    --recursive  
    --times 
    --verbose 
    --itemize-changes 
    --dry-run       
    "/Volumes/3TB/source/" 
    [email protected]:"/Volumes/3TB/"

Importantly, we are ‘itemizing’ and doing a ‘dry-run’. When you do, you will see a report like:

skipping non-regular file "Photos/Summer.2004"
skipping non-regular file "Photos/Summer.2005"
.d..t....... Documents/
.d..t....... Documents/Work/
cd++++++++++ ISOs/
<f++++++++++ ISOs/Office.ISO

The line with cd+++ indicate a directory will be created and <f+++ indicate a file is going to be copied. When it says ‘skipping’ a non regular file, that’s (in this case, at least) a symlink. You can include them, but if your paths don’t match up on both systems, these links will fail.

Spaces in File Names

Generally you quote and escape.

rsync 
  --archive ^
  --itemize-changes ^
  --progress ^
  [email protected]:"/srv/media/audio/Music/Basil\ Poledouris" ^
  /cygdrive/c/Users/Allen/Music

Though it’s rumored that you can single quote and escape with the –protect-args option

--protect-args ^
[email protected]:'/srv/media/audio/Music/Basil Poledouris' ^

List of Files

You may want to combine find and rsync to get files of a specific criteria. Use the --from-file parameter

ssh server.gattis.org find /srv/media/video -type f -mtime -360 > list

rsync --progress --files-from=list server.gattis.org:/ /mnt/media/video/

Seeding an Initial Copy

If you have no data on the destination to begin with, rsync will be somewhat slower than a straight copy. On a local system simply use ‘cp -a’ (to preserve file times). On a remote system, you can use tar to minimize the file overhead.

tar -c /path/to/dir | ssh remote_server 'tar -xvf - -C /absolute/path/to/remotedir'

It is also possible to use rsync with the option --whole-file and this will skip the things that slow rsync down though I have not tested it’s speed

Time versus size

Rsync uses time and size to determine if a file should be updated. If you have already copied files and you are trying to do a sync, you may find your modification times are off. Add the –size-only or the –modify-window=NUM. Even better, correct your times. (this requires on OS X the coreutils to get the GNU ls command and working with the idea here)

http://notemagnet.blogspot.com/2009/10/getting-started-with-rsync-for-paranoid.html http://www.chrissearle.org/blog/technical/mac_homebrew_and_homebrew_alt http://ubuntuforums.org/showthread.php?t=1806213

2 - Scheduled Rsync

Running rsync via cron has been around a long time. Ideally, you use public keys and limit the account. You do it something like this.

  • On the source
    • Configure SSHD to handle user keys
    • Create a control script to restrict users to rsync
    • Add an account specific to backups
    • Generate user keys and limit to the control script
  • On the destination
    • Copy the private key
    • Create a script and cronjob

Source

# Add a central location for keys and have sshd look there. Notice the
# '%u'. It's substituted with user ID at login to match the correct filename
sudo mkdir /etc/ssh/authorized_keys
echo "AuthorizedKeysFile /etc/ssh/authorized_keys/%u.pub" > /etc/ssh/sshd_config.d/authorized_users.conf
systemctl restart ssh.service

# Create the script logic that makes sure it's an rsync command. You can modify this to allow other cmds as needed.
sudo tee /etc/ssh/authorized_keys/checkssh.sh << "EOF"
#!/bin/bash
if [ -n "$SSH_ORIGINAL_COMMAND" ]; then
    if [[ "$SSH_ORIGINAL_COMMAND" =~ ^rsync\  ]]; then
        echo $SSH_ORIGINAL_COMMAND | systemd-cat -t rsync
        exec $SSH_ORIGINAL_COMMAND
    else
        echo DENIED $SSH_ORIGINAL_COMMAND | systemd-cat -t rsync
    fi
fi
EOF

chmod +x /etc/ssh/authorized_keys/checkssh.sh

# Add the user account and create keys for them
THE_USER="backup-account-1"
sudo adduser --no-create-home --home /nonexistent --disabled-password --gecos "" ${THE_USER}
ssh-keygen -f /etc/ssh/authorized_keys/${THE_USER} -q -N "" -C "${THE_USER}"

# Add the key stipulations that invoke the script and limit ssh options.
# command="/etc/ssh/authorized_keys/checkssh.sh\",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty 
THE_COMMAND="\
command=\
\"/etc/ssh/authorized_keys/checkssh.sh\",\
no-port-forwarding,\
no-X11-forwarding,\
no-agent-forwarding,\
no-pty "

# Insert the command in front of the user's key - the whole file remains a single line
sed -i "1s|^|$THE_COMMAND|" /etc/ssh/authorized_keys/${THE_USER}.pub

# Finally, copy the account's private key to the remote location
scp /etc/ssh/authorized_keys/${THE_USER} [email protected]:

Destination

It’s usually best to create a script that uses rsync and call that from cron. Preferably one that doesn’t step on itself for long running syncs. Like this:

vi ~/schedule-rsync
#!/bin/bash

THE_USER="backup-account-1"
THE_KEY="~/backup-account-1" # If you move the key, make sure to adjust this

SCRIPT_NAME=$(basename "$0")
PIDOF=$(pidof -x $SCRIPT_NAME)

for PID in $PIDOF; do
    if [ $PID != $$ ]; then
        echo "[$(date)] : $SCRIPT_NAME : Process is already running with PID $PID"
        exit 1
    fi
done



rsync \
--archive \
--bwlimit=5m \
--delete \
--delete-excluded \
--exclude .DS* \
--exclude ._* \
--human-readable \
--inplace \
--itemize-changes \
--no-group \
--no-owner \
--no-perms \
--progress \
--recursive \
--rsh "ssh -i ${THE_KEY}" \
--verbose \
--stats \
${THE_USER}@some.server.org\
:/mnt/pool01/folder.1 \
:/mnt/pool01/folder.2 \
:/mnt/pool01/folder.2 \
/mnt/pool02/

Then, call it from a file in the cron drop folder.

echo "0 1 * * * /home/$USER/schedule-rsync  >> /home/$USER/rsync-video.log 2>&1" > /etc/cron.d/schedule-rsync

Notes

Why not use rrsync?

The rrsync script is similar to the script we use, but is distributed and maintained as part of the rsync package. It’s arguably a better choice. I like the checkssh.sh approach as it’s more flexible, allows for things other than rsync, and doesn’t force relative paths. But if you’re only doing rsync, consider using rrsync like this;

THE_COMMAND="\
command=\
\"rrsync -ro /mnt/pool01\",\
no-port-forwarding,\
no-X11-forwarding,\
no-agent-forwarding,\
no-pty "

In your client’s rsync command, make the paths relative to path rrsync expects above.

rsync \
...
...
${THE_USER}@some.server.org\
:folder.1 \
:folder.2 \
:folder.2 \
/mnt/pool02/

If you see the client-side error message:

rrsync error: option -L has been disabled on this server

You discovered that following symlinks has been disabled by default in rrsync. You can enable with an edit to the script.

sudo sed -i 's/KLk//' /usr/bin/rrsync

# This changes
#    short_disabled_subdir = 'KLk'
        to
#    short_disabled_subdir = ''  

Troubleshooting

Sources

https://peterbabic.dev/blog/transfer-files-between-servers-using-rrsync/ http://gergap.de/restrict-ssh-to-rsync.html https://superuser.com/questions/641275/make-linux-server-allow-rsync-scp-sftp-but-not-a-terminal-login

3 - Rsync Daemon

Some low-power devices, such as the Raspbery Pi, struggle with the encryption overheard of rsync default network transport, ssh.

If you don’t need encryption or authentication, you can significantly speed things up by using rsync in daemon mode.

Push Config

In this example, we’ll push data from our server to the low-power client.

Create a Config File

Create a config file on the sever that we’ll send over to the client later.

nano client-rsyncd.conf
log file = /var/log/rsync.log
pid file = /var/run/rsyncd.pid
lock file = /var/run/rsync.lock

# This is the name you refer to in rsync. The path is where that maps to.
[media]
        path = /var/media
        comment = Media
        read only = false
        timeout = 300
        uid = you
        gid = you

Start and Push On-Demand

The default port is hi-level and doesn’t require root privileges.

# Send the daemon config over to the home dir
scp client-rsyncd.conf [email protected]:

# Launch rsync in daemon mode
ssh [email protected]: rsync --daemon --config ./client-rsyncd.conf

# Send the data over
rsync \
--archive \
--delete \
--human-readable \
--inplace \
--itemize-changes \
--no-group \
--no-owner \
--no-perms \
--omit-dir-times \
--progress \
--recursive \
--verbose \
--stats \
/mnt/pool01/media/movies rsync://client.some.lan:8730/media

# Terminate the remote instance
ssh [email protected] killall rsync

4 - Tunneled Rsync

One common task is to rsync through a bastion host to an internal system. Do it with the rsync shell options

rsync \
--archive \
--delete \
--delete-excluded \
--exclude "lost+found" \
--human-readable \
--inplace \
--progress \
--rsh='ssh -o "ProxyCommand ssh [email protected] -W %h:%p"' \
--verbose \
[email protected]:/srv/plex/* \
/data/

There is a -J or ProxyJUmp option on new versions of SSH as well.

https://superuser.com/questions/964244/rsyncing-directories-through-ssh-tunnel https://unix.stackexchange.com/questions/183951/what-do-the-h-and-p-do-in-this-command https://superuser.com/questions/1115715/rsync-files-via-intermediate-host