Large backup size

Cobal · 23 December 2024 14:38

Hey everyone,

I’ve recently set up my first backup using duplicacy doing, so I initialized my Backup folder using this command: duplicacy init -e MyBackup /mnt/user/Duplicacy.

Every time I start backing up my data using this command duplicacy backup -stats I can see that my backup folder grows by roughly the same amount, meaning that the backups which has been made are not incremental ones but rather full ones. I’ve already compared some files using duplicacy diff and could not find a reason why duplicacy had to back up every file again and again. Also, when backing up the same data using restic, the incremental snapshots are way smaller.
As an example my initial data is around 1.4 GB after running 3 backups in duplicacy my backup’s folder is around 3 GB large:

duplicacy check  -stats
Storage set to /mnt/user/Duplicacy
Listing all chunks
1 snapshots and 3 revisions
Total chunk size is 3,035M in 751 chunks
All chunks referenced by snapshot MyBackup at revision 1 exist
All chunks referenced by snapshot MyBackup at revision 2 exist
All chunks referenced by snapshot MyBackup at revision 3 exist
Snapshot MyBackup at revision 1: 3091 files (1,626M bytes), 1013,668K total chunk bytes, 239K unique chunk bytes
Snapshot MyBackup at revision 2: 3413 files (2,616M bytes), 1,993M total chunk bytes, 295K unique chunk bytes
Snapshot MyBackup at revision 3: 3626 files (3,620M bytes), 3,012M total chunk bytes, 1,041M unique chunk bytes
Snapshot MyBackup all revisions: 3,035M total chunk bytes, 3,035M unique chunk bytes

Compared to restic where 3 backups of my initial data, are in total only about 800 MB.

Can someone explain to me what I am doing wrong?

Thanks in advance and happy holidays

saspus · 23 December 2024 17:40

No, it does not mean that. Duplicacy does not distinguish between full and incremental backups.

You can compare what’s different between revisions with diff command.

What kind of data do you backup?

Cobal · 23 December 2024 18:11

Hi,
the data that I primarily back up originates from Docker volumes.
To further debug this, I’ve deleted my backup location and started all over again.
I’ve run duplicacy init -e MyBackup /mnt/user/Duplicacy to initialize my backup location and after that run multiple backups directly after each other. The docker container was stopped in the meantime, to secure that no that has been changed. The initial Backup was about 428M, the second 428M and the third 1.3G. The container from which the data originates is a Postgres database, but as already mentioned, this has been switched off.

saspus · 23 December 2024 18:29

Something is not right here. If nothing changed — the new snapshot would be zero size.

What does diff command show? There shall be no new files added, verify that.
Is the data on local or network volume?
Docker volumes in the monolithic image, or files in the folder?
Show your backup command invocation
Side note: Postgres database shall be exported, and the export backed up, to avoid stopping the container

Next step would be trying fixed chunking algorithm — depending on the answers to questions above. (Init repository with max chunk size = min chunk size). This works better on this kind of data.

Cobal · 23 December 2024 19:02

Hi,

thanks for the quick reply, I really appreciate your help.
I am backing up the files and folders of the volume itself from one disk to another on the same machine.
To initialize my backup location, I am using this command:

docker run --rm -v /mnt/user/Duplicacy:/mnt/user/Duplicacy -w /mnt/user/Duplicacy -e DUPLICACY_PASSWORD=PASSWORD cobal/duplicacy-docker:3.2.4 duplicacy init -e MyBackup /mnt/user/Duplicacy

To run my backup, I’ve created the following bash script.

The image I am using for running the backup, is a fork of this one, just with a current version of duplicacy. (I was unable to find an official docker image for the CLI, which is why I am using this one)

The output of duplicacy list is:

Storage set to /mnt/user/Duplicacy
Snapshot MyBackup revision 1 created at 2024-12-23 18:02 -hash
Snapshot MyBackup revision 2 created at 2024-12-23 18:02 
Snapshot MyBackup revision 3 created at 2024-12-23 18:06

When running the following commands:
duplicacy diff -r 1 2

Storage set to /mnt/user/Duplicacy
No file 2 found in snapshot MyBackup at revision 1

duplicacy diff -r 2 3

Storage set to /mnt/user/Duplicacy
No file 3 found in snapshot MyBackup at revision 2

I hope that helps.

saspus · 23 December 2024 19:18

Too complicated. Why are you using docker image for duplicacy in the first place? It’s already self-contained monolithic executable without dependencies. Just run it as is.

It does not. Diff command obviously failed. Please see usage here: diff · gilbertchen/duplicacy Wiki · GitHub

You probably wanted something like duplicacy diff -r 1 -r 2

/var/lib/docker/volumes/postgres-data/_data

Looks like you are backing up binary blob. For this you probably would want to use fixed chunking. Better yet — backup database according to database vendor’s advice: export it to a file, and then backup that file.

Cobal · 23 December 2024 19:30

Yeah, you are right, this is the output of the corrected diff command.

duplicacy diff -r 1 -r 2

log1.txt (25.9 KB)

You are right that it makes more sense to dump a database. However, this Docker volume is just one of many. (The others consist mostly of static files)

Since I use Unraid, I find it most convenient when I can use a Docker image

saspus · 23 December 2024 19:41

Ok. Indeed. It seems quite a few of new chunks got generated.

Try comparing output from

duplicacy list -r 1 -files

With

duplicacy list -r 2 - files

If there is no difference — then the only explanation is that underlying data within the files has changed.

If that’s the case, try initializing the new storage with fixed chunking and see if you can reproduce the issue. Fixed chunking works much better for image files, including databases and virtual machines.

Cobal · 23 December 2024 19:53

After running duplicacy list -r 1 -files log11.txt (197.8 KB)

and duplicacy list -r 2 - iles log22.txt (223.4 KB)
these are the results.

If you compare the second log with the first one using git, you can see that the lines 5-160 in the log22.txt contains new chunk files.

Does that help?

saspus · 23 December 2024 21:07

Lol that’s a fun one

So I compared list of files that you have backed up between revisions:

diff \ 
    <(sed -E 's/^.{0,94}//g' log11.txt | sort) \
    <(sed -E 's/^.{0,94}//g' log22.txt | sort)

and what do I see?

> chunks/01/0c138da413310994c1a4c84d0f733908d6cf66b0ef9bc731f983b329da1114
> chunks/02/358462b8e102f54b392e228845effeafba9cf9d27efb5fc2f734e644c17ec7
...
> chunks/f8/b1f04d9b64d48beceb840ca1f294f8ed8c0bbd3e8d92e1944254cbea678627
> chunks/fd/40676a9bf847eb3eb66d953e34215b0f4fd2b430e8dbc0653402d8f9300876
> chunks/fd/53eddb959680fa347a6bc1fa5ebb9913ace1d6b588d332ce5519ac407e7cd9
> chunks/fd/da03d54a842d21722466fcfc7f38b7c7436ce7563f192189d96e4a4121d122

You are backing up either duplicacy target storage, or its cache folder… of course, every backup will be double of the previous ! Make sure you exclude them from backup.

If you did not have wrapped duplicacy into a docker container it would not have happened – duplicacy is smart enough to exclude it’s own cache from backup – but not when it’s obscured into docker volume.

Cobal · 23 December 2024 21:45

My bad that was indeed the issue. I guess backing up every previous backup data is a bit too sophisticated . After adding -chunks/?* to the filters, every thing works fine.

Thanks a lot for your help, I appreciate the time you took to help me.

Happy holidays

system · 2 January 2025 21:46

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.