rsync

If you have to regularly copy files across a network, it's probably best to use rsync because it only copies what you need. Many other tools such as BackupPC, Duplicity etc. use rsync under the hood, and when you are doing cross-platform data replication it may be the only tool that works, so you're best to learn it. 

Linux 

Locally

Generally, it's 10% slower than just using the command cp -a but if there's a chance of interruption or change while copying, you'll have to rsync at the end anyways. You can resume a copy with cp -an, but the -n (no-clobber) option doesn't care about anything other than file names. We've escaped the new lines here, so as to make it easier to edit in a file

rsync \
--archive \
--delete \
--dry-run \
--human-readable \
--inplace \
--progress \
--verbose \
/srv/Source_Directory/ \
/mnt/Destination_Directory/

The explanations of the more interesting options are:

--archive: Preserves all the metadata, as you'd expect
--delete : Removes extraneous files at the destination that no longer exist at the source (i.e. _not_ a merge)
--dry-run: Makes no changes. This is important for testing. Remove for the actual run
--inplace: This overwrites the file directly, rather than the default behavior that is to build a copy on the other end before moving it into place. This is slightly faster and better when space is limited (I've read)

If you don't trust the timestamps at your destination, you can use the --checksum option, though when you're local this may be slower than just recopying the whole thing

A note about trailing slashes: If you remove the trailing slash from the source, it will try to copy the directory itself as you'd expect. If you add the trailing slash as we do above, it will operate on the contents of the source directory and delete as expected. If you add a trailing /* it will operate on the contents, but not delete destination files that no longer exist on source, and so everything will be a merge regardless of if you issued the --delete parameter

Across the Network

We've added a couple of excludes here, to not copy files. 

rsync \
--archive \
--delete \
--dry-run \
--exclude ._* ^
--exclude .DS ^
--human-readable \
--inplace \
--itemize-changes \
--progress \
--verbose \
/srv/Source_Directory/* \
you@server.gattis.org:/srv/Destination_Directory

Windows to Linux

One easy way to do this is to grab a bundled version of rsync and ssh for windows from the cwRsync folks

https://www.itefix.net/content/cwrsync-free-edition

Extract the standalone client to a folder and edit the .cmd file to add this at the end (the ^ is the windows CRNL escape) 

rsync ^
--archive ^
--delete ^
--dry-run ^
--exclude .DS* ^
--exclude ._* ^
--exclude .sync* ^
--exclude .Trash* ^
--human-readable ^
--inplace ^
--itemize-changes ^
--no-group ^
--no-owner ^
--progress ^
--verbose ^
--stats ^
allen@gargoyle.gattis.org:/srv/media/video/movies/* /cygdrive/D/Media/Video/Movies/

pause

g

Mac OS X to Linux

The version that comes with recent versions of OS X is a 2.6.9 (or so) varient. You can use that, or obtain the more recent 3.0.9 that has some slight speed improvements and features. To get the newest (you have to build it yourself) install brew, then issue the commands:

brew install https://raw.github.com/Homebrew/homebrew-dupes/master/rsync.rb
brew install rsync

One of the issues with syncing between OS X and Linux is the handling of Mac resource forks (file metadata).  Lets assume that you are only interested in data files (such as mp4) and are leaving out the extended attributes that apple uses to store icons and other assorted data (replacing the old resource fork).

Since we are going between file systems, rather than use the 'a' option that preserves file attributes, we specify only 'recursive' and 'times'. We also use an exclude file to keep things like "._" files from tagging along.

/usr/local/bin/rsync 
    --exclude-from=excludes.txt 
    --human-readable 
    --inplace 
    --progress 
    --recursive  
    --times 
    --verbose 

    --itemize-changes 
    --dry-run
    
    "/Volumes/3TB/Original Folder/" 

    you@server.gattis.org:"/Volumes/3TB/Original\ Folder"

The above is broken out so the options can be read easily. Importantly, we are 'itemizing' and doing a 'dry-run'. When you do, you will see a report  like:


skipping non-regular file "Photos/Summer.2004"
skipping non-regular file "Photos/Summer.2005"
.d..t....... Documents/
.d..t....... Documents/Work/
cd++++++++++ ISOs/
<f++++++++++ ISOs/Office.ISO

The line with 'cd+++'  indicate a directory will be created and   '<f+++' indicate a file is going to be copied.  When it says 'skipping' a non regular file, that's (in this case, at least) a symlink. You can include them, but if your paths don't match up on both systems, these links will fail.

Seeding an Initial Copy

If you have no data on the destination to begin with, rsync be somewhat  slower than a straight copy.  On a local system simply use 'cp -a'  (to preserve file times). On a remote system, you can use tar to minimize the file overhead.


tar -c /path/to/dir | ssh remote_server 'tar -xvf - -C /absolute/path/to/remotedir'

It is also possible to use rsync with the option --whole-file and this will skip the things that slow rsync down though I have not tested it's speed

Time versus size

Rsync uses time and size to determine if a file should be updated.  If you have already copied files and you are trying to do a sync, you may find your modification times are off. Add the --size-only or the --modify-window=NUM. Even better, correct your times. (this requires on OS X the coreutils to get the GNU ls command and working with the idea here)




http://notemagnet.blogspot.com/2009/10/getting-started-with-rsync-for-paranoid.html
http://www.chrissearle.org/blog/technical/mac_homebrew_and_homebrew_alt
http://ubuntuforums.org/showthread.php?t=1806213

Comments