How to run duplicacy as a cron job on linux?

I think the closest might be ps aux (possibly with different options).

I’d add a pruning command to the script, so that the backup doesn’t increase endlessly in size.

E.g, by adding a line like this:

/usr/local/bin/duplicacy prune -keep 0:360 -keep 30:180 -keep 7:30 -keep 1:7 -threads 4

I currently have that but I’m thinking about removing it and running prune as a separate (and less frequent) cron job. It seems the pruning is using quite some resources and it’s not really necessary to prune daily.

Does this make sense? I haven’t managed to investigate this further yet.

So, far pruning my few backups is quick, so I didn’t even think about that.

But you’re right. It doesn’t need pruning that often – with my current setings, once a week should be sufficient. Fortunately, with your instructions, it’s easy to create a second script for pruning with less frequent runs.

1 Like

This is code I use (written many years ago) for setting a lock in a script. No script will be fool proof but this should reduce instances of stale locks blocking the backup.

# Set a lock, returns 0 if successful
SetLock() {
  local _PID="$1"
  local _FILE="$2"
  shift 2
  if test -s "$_FILE" ; then
    read pid other < "$_FILE"
    test "$pid" = "$_PID" || {
      ps -p$pid >/dev/null && return 1        # Locked
      echo "$_PID" "$@" > "$_FILE"
    }
  else
    echo "$_PID" "$@" > "$_FILE"
  fi

  # Make sure our PID got in first (if two processes running in sync)
  read pid other < "$_FILE"
  test "$pid" = "$_PID"
}

# Example Usage
if SetLock $$ /tmp/mylockfile "testing setlock" ; then
  # do stuff
  echo obtained lock 1>&2
else
  echo already locked 1>&2
fi

For example, put that in setlock.sh and try

# No lock exists
$ bash /tmp/setlock.sh
obtained lock

# Lockfile already exists but is stale
$ bash /tmp/setlock.sh
obtained lock

# Lock file exists but stale in both cases (one process run after other)
$ bash /tmp/setlock.sh && bash /tmp/setlock.sh
obtained lock
obtained lock

# Lock files exists but stale, except for second process, which can't obtain lock
# which process wins may vary, but only one will win
$ bash /tmp/setlock.sh | bash /tmp/setlock.sh
already locked
obtained lock
2 Likes

A good way I have found to do locking is to use the handy flock command which will attempt to take a lock on a file or directory then execute another command.

Once the process finishes the lock is released.

The lock can be taken on any file or directory with zero impact on most other operations…unless your application tries to take its own file lock along the way.

So reimplementing the backup script in the OP by @Christoph using flock could be done as follows:

#!/bin/sh
# Attempt to change to repository and bail out if its not possible
cd /srv/NAS/ || exit 1
echo "$(date) Backing up $PWD ..."
# Run duplicacy under a lock of /srv/NAS
flock -en /srv/NAS /usr/local/bin/duplicacy -log backup -stats
echo "$(date) Stopped backing up $PWD ..."

The above means than you can use similar locking functionality for your prune process to ensure your backup and prune jobs are not overlapping.

The other difference is ensuring the script fails if the cd /srv/NAS/ command fails, which in this case is not a huge deal as the duplicacy job will fail…or maybe not if this script is executed from another repository location. For destructive scripts missing little checks like these can be catastrophic, using the following “junk cleanup” script as an example:

#!/bin/sh
# PLEASE DON'T RUN THIS!!!
cd /path/to/junk/folder
rm -rf *

Running the above will happily remove everything in the current directory that you have write access to (which is probalby everything if running from your home dir), whereas the addition of || exit 1 after the cd command will prevent this.

Another option is to add set -e as the first line of your script which causes any non-zero result to exit the whole script, so your error checking is done for free as follows:

#!/bin/sh
# This is now safe
set -e
cd /path/to/junk/folder
rm -rf *
2 Likes

cd /path/to/junk/folder && rm -rf *

Yes, definitely a better option for the contrived example I put together :+1:

Could someone help summarize the essence of this topic so far? I’m a bit confused.

What I gather is:

  1. OP proposed a tutorial for running duplicacy automatically using cronjob.
  2. The core of this tutorial has not been challenged but various people suggested ways to prevent multiple instances of duplicacy from running at the same time.
  3. Apparently the easiest one is to use flock, but I’m not sure I understand all the implications (for example: why lock /srv/NAS/? Doesn’t that prevent all other programs from accessing that directory?)

So if I want to update the OP with the flock option, I just replace this:

with this

in step 6, right?

I deleted my previous post as I didn’t realize a post on flock already existed. There are two different approaches to locking discussed here:

  • Ensuring that backups don’t overlap (using a lock file)
  • Ensuring exclusive access to a file/directory while the command is running

Which one to use depends on your use-case.

1 Like

The file or directory (to be accurate there is little difference under *nix) to use for the lock does not prevent any access by other processes, but it does prevent others taking a lock, which is what flock attempts before it runs the specified process.

Oh, good. So what’s the purpose of also locking /usr/local/bin/duplicacy?

This is not taking a lock, it’s telling flock what to execute after it acquires the lock.

The general syntax is:

flock [options] lockfile command args

So basically anything after the lockfile is the command line flock will execute while holding the lock, with the lock released once that process finishes.

Some additional options supported by flock include waiting a period of time to attempt to take a lock, or alternatively taking a shared lock which would allow multiple instances to run but prevent an instance taking an exclusive lock.

I can’t think of an example in this case for that but it could be used to allow certain things to run in parallel but ensure one other task is run by itself.

The man page for flock probably explains this better than me:

http://manpages.ubuntu.com/manpages/bionic/man1/flock.1.html

2 Likes

I have a question about flock and exclusive file locks in general during Duplicacy backup:

One of the use cases I’m exploring is using Duplicacy to back up a handful servers (including their configuration files and logs). My plan for how to handle this is to create a repository directory at /backup containing symlinks to various important stuff (following from Move duplicacy folder/Use symlink repository). For example /backup/home symlinks to /home, /backup/etc symlinks to /etc, and so on.

If I use flock to set a lock on the repository at /backup, is that lock going to follow the symlinks and put a lock on the actual /etc? These servers I’m backing up are running a number of live services 24/7, so I don’t know if I can guarantee that nothing else will attempt to modify any data in /etc (or wherever else, really — in fact I guess I can pretty much guarantee that plenty of other processes will attempt to modify the logs, for sure) during the time taken to run a Duplicacy backup job. Am I misunderstanding the proposal for how flock is used in this case?

Should I use flock to get a lock on a dummy lockfile instead of on the repo directory?

Edit: having read the flock documentation, I’m still unsure of what exactly happens if the lock is gotten on a directory instead of a file. Seems safest to specify a lockfile in the repo directory and have flock manage that lockfile instead.

Using a lock file or folder will not prevent other process from changing files under those paths.

mkdir /tmp/a
flock /tmp/a -c bash

then on another terminal you can still do

cd /tmp/a
echo hello > world

flock(1) prevents co-operating processes from obtaining the lock, ie processes running flock on the same file or folder.

I wonder if using cron may be an overkill?

I’m thinking about just having an infinite loop in the shell script that runs duplicacy backup:

#!/bin/sh
cd /srv/NAS/
while true; do
	echo "`date`" Backing up $PWD ...
	/usr/local/bin/duplicacy -log backup -stats
	echo "`date`" Stopped backing up $PWD ...
	sleep 3600
done

And then using start-stop-daemon to run it as a daemon.

This ensures two backups never run at the same time. You could also add some logic to check if any file was last modified since the last run and skip running if it did. That solves the problem outlined in Why add a revision when it is identical with the previous one?

The only disadvantage is that backups are no longer guaranteed to happen every hours, i.e. depending on how long your backup takes, it will slowly drift. But I don’t see any problem with that.

What’s everyone’s thoughts?

Thank you for the explanation — as you can probably guess, I have never used flock before :slight_smile:

How would start-stop-daemon handle stopping the daemon when duplicacy is in the middle of backing up?

In situations like this, I tend to use a signal file.

while test ! -f /var/run/stop.backup ; do
  ..
done
rm /var/run/stop.backup

You could write an /etc/init.d/backupservice script to handle starting and stopping the daemon.

start-stop-daemon sends a SIGTERM signal when you tell it to stop a process (you can also change and send a different signal via the --signal flag).

You can trap this right before starting Duplicacy:

...
	trap "" SIGTERM
	/usr/local/bin/duplicacy -log backup -stats
	trap - SIGTERM
...

I believe this effectively “shields” Duplicacy from being killed (unless a more harsh SIGKILL is used). This also works for services: stopping a service sends SIGTERM initially.

But then the service won’t stop at all as it will just ignore the SIGTERM.