How to run duplicacy as a cron job on linux?

Christoph · 15 December 2018 23:00

There are certainly many ways of doing this but here is how I managed to schedule duplicacy on a Linux server (Ubuntu 18.04).

(If you know a better/different way, please add it! Or if I made a mistake (I’m not a Linux expert!), please correct it.)

To start with: I’m running duplicacy as root so that I can backup files from different users in the same backup. If you are just backing up files from a single user, you can probably save yourself a lot of trouble by not using sudo.

1. Initialize your repository (mine is called NAS)

cd /srv/NAS/
sudo duplicacy -d init -e NAS webdav://<myusername>@webdav.pcloud.com/Backup/Duplicacy

Make sure you adjust the working directory and the init command to suit your needs.

You will probably get some errors like

Reading the environment variable DUPLICACY_WEBDAV_PASSWORD
Failed to store the value to the keyring: keyring/dbus: Error connecting to dbus session, not registering 
SecretService provider: dbus: DBUS_SESSION_BUS_ADDRESS not set

Just ignore them and enter the requested passwords (which will trigger another error, but we’ll tackle that in the next step)

2. Save your passwords into your `preferences` file

sudo duplicacy set -key webdav_password -value "this is my webdav passphrase"


sudo duplicacy set -key password -value "this is my storage passphrase"

This is probably not the most secure method, but I couldn’t figure out how to get environment variables to work with sudo… I guess security is okay, since only root has access to the preferences file)

3. Setup your filters file

See this topic on how to do this.

4.

There shall be no step 4 in this tutorial

5. Run your first backup

Before running the backup automatically, I like to text it manually to see that things are working fine:

Doing something like

sudo duplicacy -log  backup -stats

you should not be asked for any password.

You can either let your initial backup run through this way (in that case, perhaps use sudo duplicacy -log backup -stats & to let it run in the background) or you can stop it and let it be triggered according to schedule (see below).

6. Create a backup script

I want to keep logs of all automated backups and since I did not manage to achieve this on a single line in crontab, I use a script (which also adds some flexibility):

#!/bin/sh
cd /srv/NAS/ 
echo "`date`" Backing up $PWD ...
/usr/local/bin/duplicacy -log backup -stats
echo "`date`" Stopped backing up $PWD ...

Save the script wherever it suits you and don’t forget to make it executable. Mine is /home/christoph/backup_NAS.sh.

7. Schedule the backup

sudo crontab -e

and add something like

30 3 * * * /home/christoph/backup_NAS.sh > /srv/NAS/christoph/duplicacy/logs/NAS_backup_`date "+\%Y-\%m-\%d-\%H-\%M"`.log 2>&1

Adjust paths as appropriate.

leerspace · 16 December 2018 04:57

In order to prevent a backup from running before the previous one has finished, I use a script that looks something like this:

$ cat /tank/.duplicacy/scripts/backup_tank.sh:

#!/bin/bash

# https://stackoverflow.com/a/185473/1388019
lockfile="/tmp/duplicacy_tank.lock"

if [ -e ${lockfile} ] && kill -0 `cat ${lockfile}`; then
    echo "duplicacy backup already running"
    exit
fi

# make sure the lockfile is removed when we exit and then claim it
trap "rm -f ${lockfile}; exit" INT TERM EXIT
echo $$ > ${lockfile}

# run the backup
cd /tank
/usr/local/bin/duplicacy -log backup -threads 4

# clean up lockfile
rm -f ${lockfile}

Christoph · 16 December 2018 19:39

I suppose there are certain circumstances where the lock file will not be deleted (when the script doesn’t complete) and the consequence would be that no more backups will be made. Maybe I’m missing something but for my “set and forget” attitude, this is way too risky.

I’m not sure how exactly to do this, but how about checking running processes and pipe the list through grep to see if it’s running, if yes, wait etc.

If your setup is more complex, you could also use the comment option and grep for your comment.

towerbr · 16 December 2018 22:24

Well, it’s not in bash syntax, but I’ve already used tasklist, as in the script below, with two consecutive calls to Rclone:

tasklist /FI "rclone.exe" 2>NUL | find /I /N "rclone.exe">NUL
if "%ERRORLEVEL%"=="0" exit /B       (Rclone is running)
rclone  ..... (second exec)

Would have to evaluate what is the equivalent of tasklist in linux and “translate” to bash …

leerspace · 17 December 2018 01:21

I think the closest might be ps aux (possibly with different options).

tempelmann · 7 December 2019 20:18

I’d add a pruning command to the script, so that the backup doesn’t increase endlessly in size.

E.g, by adding a line like this:

/usr/local/bin/duplicacy prune -keep 0:360 -keep 30:180 -keep 7:30 -keep 1:7 -threads 4

Christoph · 8 December 2019 01:20

I currently have that but I’m thinking about removing it and running prune as a separate (and less frequent) cron job. It seems the pruning is using quite some resources and it’s not really necessary to prune daily.

Does this make sense? I haven’t managed to investigate this further yet.

tempelmann · 8 December 2019 07:53

So, far pruning my few backups is quick, so I didn’t even think about that.

But you’re right. It doesn’t need pruning that often – with my current setings, once a week should be sufficient. Fortunately, with your instructions, it’s easy to create a second script for pruning with less frequent runs.

austin.france · 9 December 2019 13:51

This is code I use (written many years ago) for setting a lock in a script. No script will be fool proof but this should reduce instances of stale locks blocking the backup.

# Set a lock, returns 0 if successful
SetLock() {
  local _PID="$1"
  local _FILE="$2"
  shift 2
  if test -s "$_FILE" ; then
    read pid other < "$_FILE"
    test "$pid" = "$_PID" || {
      ps -p$pid >/dev/null && return 1        # Locked
      echo "$_PID" "$@" > "$_FILE"
    }
  else
    echo "$_PID" "$@" > "$_FILE"
  fi

  # Make sure our PID got in first (if two processes running in sync)
  read pid other < "$_FILE"
  test "$pid" = "$_PID"
}

# Example Usage
if SetLock $$ /tmp/mylockfile "testing setlock" ; then
  # do stuff
  echo obtained lock 1>&2
else
  echo already locked 1>&2
fi

For example, put that in setlock.sh and try

# No lock exists
$ bash /tmp/setlock.sh
obtained lock

# Lockfile already exists but is stale
$ bash /tmp/setlock.sh
obtained lock

# Lock file exists but stale in both cases (one process run after other)
$ bash /tmp/setlock.sh && bash /tmp/setlock.sh
obtained lock
obtained lock

# Lock files exists but stale, except for second process, which can't obtain lock
# which process wins may vary, but only one will win
$ bash /tmp/setlock.sh | bash /tmp/setlock.sh
already locked
obtained lock

andrew.heberle · 11 February 2020 12:21

A good way I have found to do locking is to use the handy flock command which will attempt to take a lock on a file or directory then execute another command.

Once the process finishes the lock is released.

The lock can be taken on any file or directory with zero impact on most other operations…unless your application tries to take its own file lock along the way.

So reimplementing the backup script in the OP by @Christoph using flock could be done as follows:

#!/bin/sh
# Attempt to change to repository and bail out if its not possible
cd /srv/NAS/ || exit 1
echo "$(date) Backing up $PWD ..."
# Run duplicacy under a lock of /srv/NAS
flock -en /srv/NAS /usr/local/bin/duplicacy -log backup -stats
echo "$(date) Stopped backing up $PWD ..."

The above means than you can use similar locking functionality for your prune process to ensure your backup and prune jobs are not overlapping.

The other difference is ensuring the script fails if the cd /srv/NAS/ command fails, which in this case is not a huge deal as the duplicacy job will fail…or maybe not if this script is executed from another repository location. For destructive scripts missing little checks like these can be catastrophic, using the following “junk cleanup” script as an example:

#!/bin/sh
# PLEASE DON'T RUN THIS!!!
cd /path/to/junk/folder
rm -rf *

Running the above will happily remove everything in the current directory that you have write access to (which is probalby everything if running from your home dir), whereas the addition of || exit 1 after the cd command will prevent this.

Another option is to add set -e as the first line of your script which causes any non-zero result to exit the whole script, so your error checking is done for free as follows:

#!/bin/sh
# This is now safe
set -e
cd /path/to/junk/folder
rm -rf *

austin.france · 11 February 2020 12:43

cd /path/to/junk/folder && rm -rf *

andrew.heberle · 11 February 2020 12:49

Yes, definitely a better option for the contrived example I put together

Christoph · 15 February 2020 12:34

Could someone help summarize the essence of this topic so far? I’m a bit confused.

What I gather is:

OP proposed a tutorial for running duplicacy automatically using cronjob.
The core of this tutorial has not been challenged but various people suggested ways to prevent multiple instances of duplicacy from running at the same time.
Apparently the easiest one is to use flock, but I’m not sure I understand all the implications (for example: why lock /srv/NAS/? Doesn’t that prevent all other programs from accessing that directory?)

So if I want to update the OP with the flock option, I just replace this:

with this

andrew.heberle:

#!/bin/sh
# Attempt to change to repository and bail out if its not possible
cd /srv/NAS/ || exit 1
echo "$(date) Backing up $PWD ..."
# Run duplicacy under a lock of /srv/NAS
flock -en /srv/NAS /usr/local/bin/duplicacy -log backup -stats
echo "$(date) Stopped backing up $PWD ..."

in step 6, right?

tom · 15 February 2020 13:58

I deleted my previous post as I didn’t realize a post on flock already existed. There are two different approaches to locking discussed here:

Ensuring that backups don’t overlap (using a lock file)
Ensuring exclusive access to a file/directory while the command is running

Which one to use depends on your use-case.

andrew.heberle · 15 February 2020 16:01

The file or directory (to be accurate there is little difference under *nix) to use for the lock does not prevent any access by other processes, but it does prevent others taking a lock, which is what flock attempts before it runs the specified process.

Christoph · 15 February 2020 18:14

Oh, good. So what’s the purpose of also locking /usr/local/bin/duplicacy?

andrew.heberle · 16 February 2020 00:41

This is not taking a lock, it’s telling flock what to execute after it acquires the lock.

The general syntax is:

flock [options] lockfile command args

So basically anything after the lockfile is the command line flock will execute while holding the lock, with the lock released once that process finishes.

Some additional options supported by flock include waiting a period of time to attempt to take a lock, or alternatively taking a shared lock which would allow multiple instances to run but prevent an instance taking an exclusive lock.

I can’t think of an example in this case for that but it could be used to allow certain things to run in parallel but ensure one other task is run by itself.

The man page for flock probably explains this better than me:

http://manpages.ubuntu.com/manpages/bionic/man1/flock.1.html

roland.cassini · 5 May 2020 17:06

I have a question about flock and exclusive file locks in general during Duplicacy backup:

One of the use cases I’m exploring is using Duplicacy to back up a handful servers (including their configuration files and logs). My plan for how to handle this is to create a repository directory at /backup containing symlinks to various important stuff (following from Move duplicacy folder/Use symlink repository). For example /backup/home symlinks to /home, /backup/etc symlinks to /etc, and so on.

If I use flock to set a lock on the repository at /backup, is that lock going to follow the symlinks and put a lock on the actual /etc? These servers I’m backing up are running a number of live services 24/7, so I don’t know if I can guarantee that nothing else will attempt to modify any data in /etc (or wherever else, really — in fact I guess I can pretty much guarantee that plenty of other processes will attempt to modify the logs, for sure) during the time taken to run a Duplicacy backup job. Am I misunderstanding the proposal for how flock is used in this case?

Should I use flock to get a lock on a dummy lockfile instead of on the repo directory?

Edit: having read the flock documentation, I’m still unsure of what exactly happens if the lock is gotten on a directory instead of a file. Seems safest to specify a lockfile in the repo directory and have flock manage that lockfile instead.

austin.france · 7 May 2020 10:51

Using a lock file or folder will not prevent other process from changing files under those paths.

mkdir /tmp/a
flock /tmp/a -c bash

then on another terminal you can still do

cd /tmp/a
echo hello > world

flock(1) prevents co-operating processes from obtaining the lock, ie processes running flock on the same file or folder.

fstanis · 7 May 2020 14:35

I wonder if using cron may be an overkill?

I’m thinking about just having an infinite loop in the shell script that runs duplicacy backup:

#!/bin/sh
cd /srv/NAS/
while true; do
	echo "`date`" Backing up $PWD ...
	/usr/local/bin/duplicacy -log backup -stats
	echo "`date`" Stopped backing up $PWD ...
	sleep 3600
done

And then using start-stop-daemon to run it as a daemon.

This ensures two backups never run at the same time. You could also add some logic to check if any file was last modified since the last run and skip running if it did. That solves the problem outlined in Why add a revision when it is identical with the previous one?

The only disadvantage is that backups are no longer guaranteed to happen every hours, i.e. depending on how long your backup takes, it will slowly drift. But I don’t see any problem with that.

What’s everyone’s thoughts?