Run web-ui in a docker container?

k2eric · 29 January 2019 22:56

Thank you for your research regarding the machine-id!

/var/lib/dbus/machine-id will be identical for everyone running your image since it’s “baked in”. So if two different users happen to use the same hostname (e.g. duplicacy), Duplicacy would detect a license violation, no? Having a static machine-id also seems, to me at least, to be contrary to the semantics of what the value is supposed to represent.

I’ve updated my image to generate a temporary machine-id when the container launches, but also allow a few different ways for users to supply their own static values (i.e. bind-mount or environment variable).

saspus · 30 January 2019 00:11

Ooooops. You are right. I did not think that through. I assumed that the image will be rebuilt every time from the source, which is not the case.

I think the solution would be to install dbus in init script, as opposed to backed into the image; as the machine-id seems to be generated by dbus-uuidgen in the dbuses post-install script.

Proof of concept:

aleximac:~ alex$ docker run -it alpine /bin/sh
/ # apk add dbus
...
(3/3) Installing dbus (1.10.24-r1)
Executing dbus-1.10.24-r1.pre-install
Executing dbus-1.10.24-r1.post-install
Executing busybox-1.28.4-r2.trigger
OK: 5 MiB in 16 packages
/ # cat /var/lib/dbus/machine-id
50ad3d4b81530f1ee29a3f7c5c50ea85
/ # exit
aleximac:~ alex$ docker run -it alpine /bin/sh
/ # apk add dbus
...
(3/3) Installing dbus (1.10.24-r1)
Executing dbus-1.10.24-r1.pre-install
Executing dbus-1.10.24-r1.post-install
Executing busybox-1.28.4-r2.trigger
OK: 5 MiB in 16 packages
/ # cat /var/lib/dbus/machine-id
076c0a40baf41f59667dc0595c50ea98
/ #

Thank you for catching that!

I’ll fix that and check in the fix in the evening.

Edit. Actually, I think it would be even better to just leave dbus backed in the image, but run the dbus-uuidgen script in the init script unless the saved machine-id already exists in the config folder, in which case just copy it.

Edit. This is what I ended up doing; this takes care of new machine-id generation, verification and correction.

# Preparing persistent unique machine ID
if ! dbus-uuidgen --ensure=/config/machine-id; then 
	echo machine-id contains invalid data. Regenerating.
	dbus-uuidgen > /config/machine-id
fi

# Copying machine-id to container
cp /config/machine-id /var/lib/dbus/machine-id
chmod o+r,g+r /var/lib/dbus/machine-id

saspus · 30 January 2019 05:15

k2eric:

I’ve updated my image to generate a temporary machine-id when the container launches, but also allow a few different ways for users to supply their own static values (i.e. bind-mount or environment variable).
  # no luck. generate a temporary machine-id
  cat /dev/urandom | tr -dc 'a-f0-9' | fold -w 32 | head -n 1 > $path
  log "using temporary, random machine-id of $(cat $path)"

Any reason why don’t you use dbus_uuidgen which is intended to generate machine-id, and instead simulate its behavior (which may change)? Also I’m not sure urandom is diverse enough, i’d add some sort of crypto hash on top. Or just use dbus tools.

Also, i’d validate user-supplied values (dbus-uuidgen can do that for you) - duplicacy likely expects compliant string.

k2eric · 30 January 2019 08:16

Mostly because I don’t want to install dbus into the image just to create a 32 character string, especially since it can be done with a one-liner.

dbus-uuidgen internally reads from /dev/urandom, too.

Not to get off-topic, but I don’t know of a better source of random data. And I don’t think that hashing a random value will produce a more random value; If anything, hashing could have the opposite effect.

saspus · 30 January 2019 08:43

I’d argue that anything could be written in one line; but it is less readable than using dedicated tool. In addition, duplicacy uses dbus, so it does make sense to use the same environment instead of simulating it; space is cheap.
Also, dbus may be useful later, to communicate with keychain for example to work with encryption keys, etc.

They sort of make an attempt to add variability by concatenating current time to it, but honestly, I did expect more.

Since you are reading random data almost very first thing after boot, the entropy pool may not be deep enough yet (see /proc/sys/kernel/random/entropy_avail). /dev/urandom will not block and return poor quality random data; and since this is used to derive license keys we really want to minimize collisions so I would at least wait for sufficient entropy to accumulate. Simply using /dev/random should do the trick, as it will block until enough entropy is available.

The amount of entropy won’t change, but correlation between bits will (when using crypto hash, not just CRC). In fairness, I can’t imagine that somebody will try to exploit that to get free duplicacy license but addressing this does not cost anything and may reduce probability of collisions.

k2eric · 30 January 2019 20:25

I hear ya and respect your idea, but in this case I just have a different opinion. ¯\_(ツ)_/¯

I disagree here. Remember that a container shares the same kernel with the host, so reading from /dev/{u}random in the container will, behind the scenes, ask the same kernel for random bytes. You can verify this yourself:

cat /proc/sys/kernel/random/entropy_avail && docker run --rm alpine cat /proc/sys/kernel/random/entropy_avail

You’ll see that the available entropy is essentially the same, even in the freshly-started container.

You could make the argument that a physical system or virtual machine will have low entropy on boot, but even then many distros will keep a seed file (/lib/systemd/systemd-random-seed or /var/lib/systemd/random-seed) to help with entropy after booting.

The consensus among the experts, as I interpret it at least, is that urandom is recommended for the vast majority of all cryptographic uses. There are a few rare instances where /dev/random is better like embedded systems after boot, or information-theoretic security.

Again I have to disagree. A hash of any kind (crypto or otherwise) simply maps a set of bits to another set of a fixed number bits; it won’t improve the randomness of the final value. Even if you salt the input or use a randomness extraction function on it, you’re relying on additional random data that needs to come from somewhere.

But I agree with you that we’re dealing with a software license and not protecting someone’s bank account or medical history, so we might be going overboard But it’s a good discussion, and I look forward to continuing to work together to develop images for Duplicacy.

gotcode · 22 February 2019 16:40

Does anyone know where the stats for the dashboard are stored in the container? I’ve mounted logs and stats to the host container. I see the last run of the backups and schedules within those tabs. But every time I restart the container I lose the stats/history on the dashboard screen.

Thanks!

saspus · 22 February 2019 18:55

Are you asking about saspus/duplicacy-web or erichough/duplicacy ?

In the former stats are kept in the /config/stats folder, in the latter stats go to a separate volume /var/cache/duplicacy.

gotcode · 22 February 2019 20:18

I was asking about erichough/duplicacy.

I have that stats directory mounted on the host, but the dashboard stats still don’t persist between docker container restarts. I can see the stats on the schedules page, but not on the dashboard.

Any thoughts? Here’s my docker command:

docker run --name=duplicacy -v /etc/localtime:/etc/localtime:ro -p 3875:3875 -v /media/data/working/duplicacy-logs/:/root/.duplicacy-web/logs -v /media/data/working/duplicacy-stats/:/root/.duplicacy-web/stats -v /media/data/working/duplicacy/:/etc/duplicacy -v /media/:/storage erichough/duplicacy

saspus · 22 February 2019 20:36

I’m not the author, but take a look at Dockerfile. You’ll find that the stats go under /var/cache/duplicacy. It needs to be mapped to persist.

gotcode · 22 February 2019 21:16

Oh gotcha.

I am actually mapping it. In the container, ~./duplicacy-web/stats is symlinked to /var/cache/duplicacy.

I see the stat files being persisted on the host, but it doesn’t look to me like those are the right files to populate the dashboard.

In the version you created, do the dashboard stats persist from run to run?

saspus · 23 February 2019 05:59

Just checked; seems to be preserving the stats just fine:

This is my test script to entirely wipe the container and image and start anew; the dashboard displays all past activity between invocations:

echo Deleting container
export container=$(docker ps -a |grep duplicacy-web-container | awk '{print $1}');
if [ -n "$container"  ]; then
    docker rm $container || exit 1
fi
echo Deleting Image
export image=$(docker images |grep saspus/duplicacy-web | awk '{print $3}')

if  [ -n "$image" ]; then
    docker rmi $image || exit 2
fi

temp=/tmp/dupl01
mkdir $temp

# build
docker build --tag=saspus/duplicacy-web .   || exit 3

docker run  --name duplicacy-web-container             \
        --hostname duplicacy-web-docker                \
         --publish 3875:3875/tcp                       \
             --env USR_ID=$(id -u)                     \
             --env GRP_ID=$(id -g)                     \
          --volume $temp/config:/config                \
          --volume $temp/logs:/logs                    \
          --volume $temp/cache:/cache                  \
          --volume $temp/backuproot:/backuproot:ro     \
          --volume $temp/storage:/storage              \
                   saspus/duplicacy-web

edit: comment out the build command, you don’t need to build it of course.

gotcode · 23 February 2019 16:35

Thank you. I removed the container and image, rebuilt everything, and it seems to be working now. I’m not sure what I did differently, but recreating the volume mappings per your post seems to have done the trick. Thanks for helping!

gkoerk · 3 March 2019 02:23

@saspus - The only heartburn I’m having with the image may be hard to get around. Do you by chance do an Disk backup to an External device? I don’t know how the app handles the situation when a storage isn’t available, I presume it just errors but continues running everything else. The problem is, no matter how I try mapping the external drive, the Duplicacy container goes down completely if the USB device is removed (because a mounted volume is removed). Any ideas?

saspus · 3 March 2019 07:29

I actually have an issue from the other end – container refuses to start if mounted volume is missing.

However thinking about this – it is actually a desirable behavior. There is no good way to handle disappearing volume in flight – what’s supposed to happen with backup in progress? If it is destination - OK, just handle like any other media failure, but if the volume is a source – do we backup the absence of files?

There is a workaround of course – mount parent folder to container instead of a mount itself. Then it becomes duplicacy’s job to deal with disappearing destination; it would just refuse to work if disk is missing in the beginning (with “Storage has not been initialized” error) and if it is yanked mid-backup – well, it’s just like any other network interruption.

------8<------ //offtopic here

What bothers me here however (and this is another off topic can of worms) is the use of USB drive as a backup destination in the first place: abysmal reliability, write-hole due to sudden disconnects and questionable controller behavior, and more importantly – bit-rot. Duplicacy will probably detect corruption during restore but not until restore is performed, and unless you use BTRFS with DUP for both data and metadata on it – that won’t help either – data would be unrecoverable. And likely using Btrfs/ZFS on it is also not a great idea - due to the same write-hole and USB controller doing crazy things between the filesystem and actual disk.

gkoerk · 11 March 2019 20:01

I’ve noticed and was about to ask if I was the only one. I have another issue, however:

When most of my (other) services that run in docker die, swarm recreates them with ease. When the NAS is restarted, docker is restarted along with the apps that were running before, without a hiccup. I have alerting ready should duplicacy go down, or not come up after reboot, but I never get paged because the container is running, but (I guess) I’ve not re-logged in? Not sure why we couldn’t get this to run in a daemon mode such that it resume the previously scheduled backup schedule a up as soon as it wakes back up, but I think the app may need the ability to start without user input.

gkoerk · 13 March 2019 22:31

By the way, I’m not using a USB “Drive” - I am fully aware of all the concerns you have listed. I am using a USB 3.1 connected HDD. Does that mitigate any of your cautions/concerns?

Unrelated but more importantly, when the container first starts up it does not run any scheduled backups until/unless I’ve manually logged into the Web UI (and I presume, run them.) I presume because I haven’t tried just logging in and back out and seeing if my backups are running then. Is there something in the container that would be causing this behavior? Or is this part of the app design?

gkoerk · 18 May 2019 02:48

I noticed you updated the container to the new Web UI. Any chance it will pass the DWE_PASSWORD environment variable through to Duplicacy to make it work on a headless box without the password: Trouble with the encryption password on a headless Linux box

saspus · 18 May 2019 05:01

Ah, awesome feature. Yes, of course, I will add it, test it, and push the update later today. Thank you for the tip!

saspus · 18 May 2019 18:28

Thinking about this — I don’t think any changes are needed in the container: you can just pass the environment variable to the container via -e params: e.g. -e DWE_PASSWORD="Password"