Some basics about running duplicacy in a docker container

Christoph · 30 July 2020 11:52

I’m starting to appreciate docker on my home server. Running various apps and services in a container allows me to keep my operating system (Debian/OpenMediaVault) as clean as possible without having to run lots of virtual machines.

@saspus is providing a docker image for duplicacy (here: Bitbucket) so I take it that it makes sense to dockerize even duplicacy, but I’m not sure I fully understand

a) why/under what circumstances this makes sense
b) how it works.

Ad a), here is what I found so far:

I do run docker as root and my machine is (for the time being) not exposed to the internet.

Hm, not sure how this is an advantage as it also means: I have to make sure everything I want to backup is indeed mounted. I think I prefer to rely on duplicacy’s filter file (as well as . nobackup files and, if need be, symlinks) for managing what is or is not backed up. Why introduce a second selection layer?

I totally see the advantage here but it applies exactly once (upon first setup) and I have learned that this is also easily solved with a tunnel.

So is there anything else to say about dockerizing duplicacy?

Ad b), what I’m mainly wondering is: how does duplicacy get access to the files it should back up? I realize that the answer might already be up there in this very post: you have to mount everything. But if this is correct, then I’d anticipate all kinds of access problems which I probably wouldn’t have when running duplicacy as root on the server directly?

I’d also anticipate confusions around repository paths as these will we relative to the container, not the host. Not unsolvable, of course, but adding friction.

Please let me know if I’m missing any important points (I’m pretty sure I am).

dephcon · 30 July 2020 13:04

it’s great for running on limited functionality linux systems/appliances, like unRAID in my case, where you can’t just install a deb or rpm.

i get the app and the webui in one package and I can inject mounts into the container as i see fit:

my server’s root filesystem as read only
a very specific path (dedicated to duplicacy) where i want the local backup to land as RW
config/logs to a raid protected ssd pool
cache to a ‘scratch’ ssd that isn’t protected/backed up

and because i separately backup the configuration of this container, it’s easily reproduced if i lose my docker.img

saspus · 30 July 2020 17:40

Think of docker as chroot + networking. It provides entirely separate user mode environment but relies on host systems kernel.

In contrast virtual machine runs both kernel and userspace (via hypervisor that can either run on the host OS or sit under it).

Docker is useful to isolate dependencies and convert a complex app consisting from multiple interacting services and libraries into one single disposable monolithic entity that operates on some data mapped out to the host system. Instead of updating apps in the container one is expected to replace the whole container; since containers are isolated and immutable their behavior is reliable and predictable.

For Duplicacy therefore containerization does not make much sense as it is written in go and hence is already single monolithic executable with no dependencies. Wrapping it into container does not add value in that regard — as there are no dependencies to isolate to begin with.

Some systems (Synology, unraid, etc) provide docker UI and docker hub can be viewed as a sort of software distribution channel so this side of docker infrastructure may be useful to simplify Duplicacy installation. I suspect that’s the reason for the majority of people to use it.

Even though I wrote that container to run Duplicacy on synology I don’t use it myself anymore:

But since it’s has now got 500k pulls I fee obligated to keep supporting it…

dephcon · 30 July 2020 18:11

thank you for continuing!

that said, now i feel a bit weary about dropping 4x5yr licenses when my trial period expires as i don’t have much option if the container disappears

saspus · 30 July 2020 18:48

Haha. Don’t worry about that, I’ll be supporting it while I’m around, it does not really take much work. I stopped using it about year and half ago, and yet I did update it couple of times during that period FWIW.

Actually, you are likely using :latest tag there, which has specific version of duplicacy_web baked into the container, so when @gchen releases new version I need to update the version string, have the container rebuilt and users need to download a new image and update their instance.

There is now another tag: :mini: it does not have anything backed in. It’s an extremely lightweight shell. To update to the new duplicacy you don’t need to download new image – you just update the version string in the environment variable and restart the container. It will download new duplicacy_web (just as duplicacy_web downloads new Duplicacy CLI on start; same behavior).

This contradicts a bit docker ideology – that containers are immutable – but duplicacy_web already does it with cli so the immutability has already been thrown out of the window; all I did was extend the same behavior to duplicacy_web itself.

This however aligns better with my understanding of the intended use form previous comment – as installation facilitator – and in that regards thin shell is preferable.

With :mini you don’t really need to ever update the container itself as nothing would change there (aside of drastic changes in docker itself – but then I’d assume docker hub would rebuild existing containers anyway; or some crazy bug in the container script – but they are rather simple so I don’t anticipate much)

Eventually the :mini branch will become :latest in a few months anyway (when I’m absolutely sure it is working reliably)

saspus · 30 July 2020 22:11

Doh, missed this question due to infinitesimal attention span…

You are correct, that duplicacy can only see what’s visible from the container so you would map stuff into the container from outside. In fact, you can map multiple things into a single share effectively replicating duplicacy’s existing functional of collecting a bunch of folders all over the system under singe virtual “repo”.

e.g.:

docker run ... \
 --volume /host/path1/:/backuproot/path1:ro  \
 --volume /host/totally/differnet/path2/:/backuproot/path2:ro \
...

and then pointing duplicacy to create repo in the folder /backuproot which would have path1 and path2 subfolders.

Access problems are solved on the host: you would create a user “Duplicacy” on the host system and give it access to read stuff it needs to backup. Then you would run the container on behalf of that user. Or in case of my container, you can specify UID and GUID under which to run the duplicacy – so you can run your docker container under root but duplicacy_web will be run under specified user.

The path names – nothing prevents you to map them in the same place in the container as they are on the host if you wanted to:

docker run ... \
 --volume /some/crazy/path/of/sentimental/value/1:/some/crazy/path/of/sentimental/value/1:ro  \
 --volume /anthother/fine/path:/anthother/fine/path:ro \
...

Then duplicacy instance in the container will see the same paths. I however to map everything to under single mappable share but that’s matter of taste.

dephcon · 5 August 2020 01:20

switched to the :mini version to upgrade to 1.4.0, thanks for the heads up.

saspus · 5 August 2020 03:22

(I’m in the woods camping with family, will update the :latest to 1.4.0 by the end of the week)

If :mini does not work for your or misbehaves in any way I’d like to know (if you have a minute)

dephcon · 5 August 2020 12:36

works totally fine, just switched the tag and added the version variable and it booted with my existing config.

Christoph · 4 September 2020 21:24

This might be another reason for dockerizing duplicacy:

saspus · 5 September 2020 09:38

Wow. That’s a nice workaround if one already has a Docker engine running. Otherwise installing Docker to throttle Duplicacy sounds a massive overkill.

I’m going to switch to using container and see how it goes

Christoph · 5 September 2020 11:43

Ye, I think someone on the internet is maintaining a docker image, try that one and tell us if it works

And if you know how to set those CPU flags with docker compose (v2), please share

Christoph · 5 September 2020 12:14

Now that dockerizing is becoming an option again, I’m having the following thoughts about another possible advantage: portability. A while ago I migrated my NAS/home server from Ubuntu to Openmediavault and for various reasons the paths to some of the files that duplicacy is supposed to backup changed.

Thanks to duplicacy’s deduplication I didn’t have to worry about (too much) extra storage being wasted, but, nevertheless, I don’t like that I now have to remember different paths, depending on which version I want to restore. So it occurs to me that running duplicacy in a container would have allowed me to keep the same paths (by mapping volumes into the container accordibgly). Even better, if I had used docker already on the ubuntu machine, the (re-)mapping of the paths would have been even more natural.

You might say thay there are other ways of achieving the same, like symlinks using multiple repositories rather than only two, like I did, but to me, the docker way feels somewhat cleaner, especially when you’re already running everything else in containers.

GuillaumeBoudreau · 5 September 2020 13:13

I’ve been running duplicacy in docker on my home server for quite a while now, and happy about it. Here’s how I do it.

run.sh is scheduled using cron (once a day for me; choose whatever frequency fits you mood):

#!/bin/bash

set -e
if [[ $EUID -ne 0 ]]; then
    >&2 echo "This script must be run as root."
    exit 17
fi

# chdir to folder that contains this script
cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1

LOG="$(pwd)/run.log"

echo "[$(date)] Starting backups using duplicacy ..." >> "${LOG}"

echo "[$(date)] Building duplicacy docker ..." >> "${LOG}"
(cd docker-build/ ; sudo docker build -t duplicacy . >/dev/null)
echo "[$(date)]   Done." >> "${LOG}"

for d in _vol/backups/*; do
    echo "[$(date)] Starting backup of $d ..." >> "${LOG}"
    docker run --rm \
        --name duplicacy-backup-to-gdrive \
        --hostname duplicacy \
        --cpu-shares 512 \
        -v "$(pwd)/_vol/scripts":/scripts \
        -v "$(pwd)/$d:/backups" \
        -v "$(pwd)/_vol/gcd-token.json":/backups/.duplicacy/gcd-token.json \
        -t duplicacy \
        /scripts/duplicacy-backup.sh "/backups"
    echo "[$(date)]   Done." >> "${LOG}"
done

That script lists folders (symlinks to folders really) in _vol/backups/, and run duplicacy backup on each of them, one by one.
Examples of what I have in my _vol/backups/ folder:

 etc -> /etc
'Home Movies' -> '/mnt/samba/Videos/Home Movies'
'Other backups' -> '/mnt/samba/Backups/Other backups'
 persistent-app-data -> /mnt/hdd5/persistent-app-data
 Photos -> /mnt/samba/Photos

For duplicacy to be able to backup each of those, they each need to have their own .duplicacy/preferences|filters configuration (with a different snapshot id). But I am backing them all up to the same storage (Google Drive), so I’ve put the required gcd-token.json in my _vol folder, and I just mount it into /backups/.duplicacy/gcd-token.json (which is the reference I need to have in my preferences files).

Side-note: By specifying --name=... when running my docker container, I ensure no two backups will be executed simultaneously, because container names needs to be unique.

/scripts/duplicacy-backup.sh is just a helper script I use to run duplicacy backup; I prefer to mount it when I run my container, versus including it in the container image, because it makes it easier to modify it and just re-run the container, without having to re-build the image. (It’s for development mostly…)

Keeping the relevant parts, here’s my scripts/duplicacy-backup.sh:

#!/bin/bash

export REPO="/backups"
export BACKUP_OPT="-stats -threads 6"
export CHECK_OPT="-stats"
export PRUNE_OPT="-all -threads 6 -keep 0:3650 365:365 30:30 -keep 7:1" # -keep <n:m> [+]  keep 1 revision every n days for revisions older than m days
export EXEC="/usr/local/bin/duplicacy"
export EXEC_OPT="-log"
export LOG_DIR="${REPO}/.duplicacy/logs"
## End of config

if [[ $EUID -ne 0 ]]; then
   echo "This script must be run as root" 1>&2
   exit 1
fi

cd "${REPO}" || exit

mkdir -p "${LOG_DIR}"

LOG="${LOG_DIR}/backup-log-$(date +'%Y%m%d-%H%M%S')"
COMMAND="${EXEC} ${EXEC_OPT} backup ${BACKUP_OPT}"
echo "Command: ${COMMAND}" >> "${LOG}"
# shellcheck disable=SC2086
nice ${COMMAND} >> "${LOG}" 2>&1

LOG_LINES=$(grep -v " INFO " "${LOG}" | grep -v "^Command:")
if [ "${LOG_LINES}" != "" ]; then
    echo "${LOG_LINES}" | mail -s "duplicacy backup warnings/errors" guillaume@me.com
fi

# Once a month, do a check
if [ "$(date +'%d')" = "01" ]; then
    touch .duplicacy/ran_check
    
    LOG="${LOG_DIR}/check-log-$(date +'%Y%m%d-%H%M%S')"
    COMMAND="${EXEC} ${EXEC_OPT} check ${CHECK_OPT}"
    echo "Command: ${COMMAND}" >> "${LOG}"
    # shellcheck disable=SC2086
    nice ${COMMAND} >> "${LOG}" 2>&1

    LOG_LINES=$(grep -v " INFO " "${LOG}" | grep -v "^Command:")
    if [ "${LOG_LINES}" != "" ]; then
        echo "${LOG_LINES}" | mail -s "duplicacy check warnings/errors" guillaume@me.com
    fi
elif [ "$(date +'%d')" = "02" ]; then
    rm -f .duplicacy/ran_check
fi

LOG="${LOG_DIR}/prune-log-$(date +'%Y%m%d-%H%M%S')"
COMMAND="${EXEC} ${EXEC_OPT} prune ${PRUNE_OPT}"
echo "Command: ${COMMAND}" >> "${LOG}"
# shellcheck disable=SC2086
nice ${COMMAND} >> "${LOG}" 2>&1

LOG_LINES=$(grep -v " INFO " "${LOG}" | grep -v "^Command:")
if [ "${LOG_LINES}" != "" ]; then
    echo "${LOG_LINES}" | mail -s "duplicacy prune warnings/errors" guillaume@me.com
fi

# Delete empty log files
find "${LOG_DIR}" -type f -size 0 -delete

# Delete old log files
find "${LOG_DIR}" -type f -mtime +30 -delete

Finally, here’s my docker-build/Dockerfile:

FROM alpine:3.10

RUN apk add --no-cache curl bash ssmtp mailx php-cli

ARG DUPLICACY_VERSION=2.6.2
RUN curl -sLo /usr/local/bin/duplicacy "https://github.com/gilbertchen/duplicacy/releases/download/v$DUPLICACY_VERSION/duplicacy_linux_x64_$DUPLICACY_VERSION" \
 && chmod +x /usr/local/bin/duplicacy

# SSMTP (to be able to send emails)
COPY config/ssmtp.conf /etc/ssmtp/ssmtp.conf
RUN echo "hostname=`hostname`.home.me.com" >> /etc/ssmtp/ssmtp.conf

RUN apk add --no-cache tzdata && cp /usr/share/zoneinfo/America/Montreal /etc/localtime && echo "America/Montreal" >  /etc/timezone && apk del tzdata

CMD ["bash"]

LABEL description="duplicacy" \
      duplicacy="duplicacy v$DUPLICACY_VERSION" \
      maintainer="Guillaume Boudreau <guillaume@me.com>"

Christoph · 5 September 2020 17:45

Not sure how stupid this question is but why are you building the container anew every day?

GuillaumeBoudreau · 5 September 2020 18:03

It’s a safeguard against me forgetting to do a docker build after changing a file included in my build.
If no files have changed, the build process will just re-use the cached layers, and that command will return very fast.

When a new duplicacy version is released, I update my Dockerfile to put the new version number in there, and that’s it. Same if I change any of the files in my image (like the ssmtp config file, for example). Easier to just execute run.sh (or better, just wait for the cron job to run it) than having to remember to do a docker build every time I make a change.

Christoph · 5 September 2020 21:06

I believe that duplicacy-web automatically downloads the latest version. So you might be able to simplify things even moreby using duplicacy-web instead of the CLI version.

Christoph · 1 October 2020 16:00

This topic was automatically closed after 26 days. New replies are no longer allowed.