How to use rsync to transfer files and directories between hosts

The rsync utility can backup files, synchronize directory trees, and much much more, both on the local machine and between two different hosts—via push and pull. Here is how to tame it.

This article explains:

how to synchronize the contents of two local directories
how to perform a secure rsync via ssh:
- an rsync push from the local host to a remote host
- an rsync pull from a remote host to the local host.

It goes without saying that rsync can have disastrous consequences if performed incorrectly, so whenever in doubt, initiate a –dry-run and always think before you type.

rsync for starters: synchronizing the contents of two local directories

To synchronize the contents of two directories on your local machine, use the rsync utility like this:

rsync -r source-directory/ destination-directory

The trailing slash / ensures that the contents of your source-directory would transfer to the destination-directory (in this case) on the same host.

Without the trailing slash, your source-directory would become a sub directory of your destination-directory (in this case, both on the same host).

The -r option sets the recursive mode. It tells rsync to traverse subdirectories.

Remote rsync via ssh

Synchronizing local directories using rsync is a piece of cake, but what if you want to securely rsync files and directories between two separate hosts using ssh with public key authentication over the network? Easy.

There are two ways to accomplish this:

a push operation transfers data from the local host to the remote host,

and the other way around:

a pull operation transfers data from the remote host to the local host.

rsync push: transfer data from your local host to a remote host

To initiate an rsync push from the local host to a remote host (or from one remote host to another) via ssh, you can use this command at the prompt of the host that will push its data:

rsync -azhe "ssh -p 22 -i /path/to/.ssh/keypair/private_key_file" /local/path/to/source-directory remoteuser@re.mo.te.ip:/remote/path/to/destination-directory

rsync pull: transfer data from a remote host to your local host

To initiate an rsync pull from a remote host to the local host, you need to set up ssh with the private key placed locally (preferably in $HOME/.ssh of the active local user) and the corresponding public key located on the remote computer (in a separate line inside the file authorized_keys in in $HOME/.ssh of the remote user).

To set up the keys, follow steps 1 through 5 in this tutorial: How to Set Up a Connection between Two Hosts Using Authentication Based on Key Pairs for Remote Access via ssh, rsync.

Once you have verified that you can establish an ssh connection, you can now use this command:

rsync -azhe "ssh -p 22 -i /path/to/.ssh/keypair/private_key_file" --progress remoteuser@<ip-address-or-domain>:/path/to/items/on/the/REMOTE/HOST/OBJECT /path/to/items/on/the/LOCAL/HOST/DESTINATION/

The -p flag specifies the ssh port number on the remote host to connect to, so if your remote ssh demon listens on a different port number, you need to replace the default 22 with that port number. The parent directory on the target machine (in the above example DESTINATION) must already exist. OBJECT can be several files or directories (several, when using wildcards).

User mapping in rsync

rsync version 3.1.0 introduced user mapping with the –usermap and –groupmap options. This allows you to specify ownership of files on the remote system like this:

--usermap=originowner:destinationowner
--groupmap=origingroup:destinationgroup

Since option -a will cause rsync to preserve ownership, you may want to puzzle its functionality together by setting individual flags (see above section about useful rsync options).

rsync in the cloud with systems that don’t support root login

Many AMIs (Amazon Machine images) on AWS are configured in a way that disallows remote authentication as root (even though they may contain a public key in root’s authorized_keys file). The administrator connects as an unauthorized user, then enters sudo su or sudo -i to acquire root privileges. This may be a good practice, but it’s counterproductive when you want to use rsync. The utility may not be able to write to a directory owned by root:root unless root initiates the connection. So what’s the fix?

The simplest way to work around the problem is by activating root login in

/etc/ssh/sshd_config

by using the parameter PermitRootLogin.

This parameter can take one of four values:

PermitRootLogin without-password: this disables password authentication for root, allowing other authentication methods (such as keys);
PermitRootLogin forced-commands-only: this allows root login with public key authentication, but only if the ‘command’ option has been specified; no other authentication methods are allowed;
PermitRootLogin no: this setting prohibits login as root regardless of the authentication method used;
PermitRootLogin ~~yes~~: allows unsafe login as root (you don’t want this one).

Set this option as follows:

PermitRootLogin without-password

and append your public key into the authorized_keys file that is located in the $HOME/.ssh folder that belongs to root. For more details on how to create and install your key pairs read How to Set Up a SSH Connection Using Authentication Based on Key Pairs.

With PermitRootLogin without-password in place, SSH will still ask the root user for a password, even though no password will work. This behavior is rather unnerving (security by obscurity?). To suppress this behavior, add these two lines to /etc/ssh/sshd_config:

Match User root
PasswordAuthentication no

If you would like to prohibit password authentication for all users on the system, this will do the trick:

PasswordAuthentication no

Useful rsync options

Some useful rsync options include:

a or –archive: activates the archive mode; this mode tells rsync to:
- traverse directories recursively (implies option -r),
- preserve:
  - symbolic links (-l or –links) and
  - other special and device files (-D or –devices and –specials); with additional options for restricions on link validity and the like,
- transfer:
  - permissions (implied option -p or –perms),
  - user and group ownerships (-o for –owner and -g for –group), and
  - timestamps (-t or –times),
- but does not imply the -H option for hard links (this option can be set separately);
e specifies the remote shell to use (see section on rsync via ssh);
h or –human-readable: outputs file sizes in a human-readable format with units of K for kilobytes, M for megabytes, and G for gigabytes; if specified twice (-hh), the units are powers of 1024 instead of the default 1000;
z or –compress enables compression;
–progress outputs a line-by-line activity report (implies -v for –verbose, an option that in itself can be very useful when troubleshooting connection problems);
R replicates the complete directory tree with an absolute path instead of a relative one (/the/complete/original/path/to/object becomes /destination/path/the/complete/original/path/to/object);
n or –dry-run allows for a risk-free test run of rsync without actually copying anything; remember to ask for verbose output using either -v, –verbose or –progress.

rsync supports plenty of other useful options that make for rather sophisticated methods of operation. For additional inspiration, you can always refer to the manual:

man rsync