Incremental backup reads contents of all files despite no -hash option

apunjani · 14 December 2018 01:19

I just started using Duplicacy on a Linux system (Ubuntu 16.04) that has ~600GB to back up to Wasabi.
The initial backup completed fine, and was much faster than other backup tools (average 50MB/s upload). The complete backup took 3h15m.

The command I used was:
duplicacy backup -storage wasabi -stats -threads 4

Incremental backups (using the same command and number of threads) now take 35-40 minutes, and are reading the disk at 80-100MB/s the entire time. Almost all chunks are “Skipped” and nothing is uploaded, but I was under the impression that without the -hash option, the backup should only check timestamps and therefore not need to do so much disk IO.

Is this expected behaviour?

gchen · 14 December 2018 02:55

Did you see this line in the log?

Last backup at revision xxx found

If you see this line, then the only reason why it still scanned all those files was that the timestamps did change.

You can choose a file that you knew was rescanned in the second backup, and run:

duplicacy history relative/path/to/file

This will tell you if the timestamps are different.

apunjani · 14 December 2018 04:48

Thanks for the fast reply @gchen!

You’re right - my mistake. For future readers: my backup set includes a couple of live MongoDB instances that are currently handling a very small number of writes per minute. The actual database files however, are 100GB and 50GB in size each, and those tiny writes are changing the timestamp of the entire files, which of course Duplicacy needs to read entirely. It does only upload small amounts of change data in the end though, which is great.
Overall, it seems that duplicacy is an awesome tool! I’m looking forward to getting the webapp going. Scheduling on linux (and getting email notifications) seemed like it would be a pain, without the webapp.

apunjani · 14 December 2018 04:50

I did try the history command, which ended up taking about 15 minutes to report the history for a single file (Wasabi backend, 600GB backed up, about 3.9M files total). Is this normal?

apunjani · 14 December 2018 04:54

Also (sorry for the multiple posts) what happens when duplicacy is backing up a file that is currently open and being written into (as in the MongoDB case)? Would those files typically become inconsistent in the stored backup?

Reading through the docs a bit, I see there’s support for pre- and post- backup scripts.
Would it make sense to do something like:

Pre-backup: use mongodump or a filesystem snapshot to “freeze” a copy of the database files
backup the dump/snapshot via duplicacy
Post-backup: delete the dump/snapshot

TheBestPessimist · 14 December 2018 06:08

This seems like quite a good idea!