Backup larger than original?

Dinsdale · 19 July 2025 18:15

Hi,

I just started out with Duplicacy Web 1.8.3. on my Mac with 15.5. I followed the startup guide to do a backup of one of my NVMe SSDs, containing 1.5TB of mostly images to a SATA SSD. I was quite surprised to see that the backup was larger, some 1.8TB, than the original. From experience with other bakup tools (Crashplan) that do dedup and compression, I was expecting no more than 20-30% decrease in size, but definitely not an increase. I would love to check what Duplicacy is doing with my data but have no idea where to start. Can someone point me in the right direction please?

saspus · 19 July 2025 19:27

What’s the filesystems on your external SSD? “Mostly images” didn’t deduplicate nor compress. But if you have a filesystem with the large sector size it can waste a lot of space.

Furthermore, depending on what happened before the last backup you may have a lot of trash in the storage.

More details are needed and also stats output from check command

Was it the first backup or the size grew over time inspite of data being almost constant? Please also show first few lines from the backup log, I want to see command line parameters to backup command

Dinsdale · 19 July 2025 20:49

Thanks for responding @saspus

Both filesystems are APFS, one on NVMe one on a cheaper SATA SSD.

Not sure what you mean by ‘what happened before the last backup’? It was actually the first backup…

I did run the check job as per the user guide and copied the output from the last one:

Running check command from /Users/tommy/.duplicacy-web/repositories/localhost/all
Options: [-log check -storage Duplicacy_Test -a -tabular]
2025-07-19 19:34:21.519 INFO STORAGE_SET Storage set to /Volumes/Duplicacy
2025-07-19 19:34:21.549 INFO SNAPSHOT_CHECK Listing all chunks
2025-07-19 19:50:29.381 INFO SNAPSHOT_CHECK 1 snapshots and 8 revisions
2025-07-19 19:50:29.390 INFO SNAPSHOT_CHECK Total chunk size is 1722G in 367943 chunks
2025-07-19 19:50:30.154 INFO SNAPSHOT_CHECK All chunks referenced by snapshot SSD4_Test at revision 1 exist
2025-07-19 19:50:30.774 INFO SNAPSHOT_CHECK All chunks referenced by snapshot SSD4_Test at revision 2 exist
2025-07-19 19:50:31.306 INFO SNAPSHOT_CHECK All chunks referenced by snapshot SSD4_Test at revision 3 exist
2025-07-19 19:50:31.827 INFO SNAPSHOT_CHECK All chunks referenced by snapshot SSD4_Test at revision 4 exist
2025-07-19 19:50:32.397 INFO SNAPSHOT_CHECK All chunks referenced by snapshot SSD4_Test at revision 5 exist
2025-07-19 19:50:32.940 INFO SNAPSHOT_CHECK All chunks referenced by snapshot SSD4_Test at revision 6 exist
2025-07-19 19:50:33.492 INFO SNAPSHOT_CHECK All chunks referenced by snapshot SSD4_Test at revision 7 exist
2025-07-19 19:50:34.030 INFO SNAPSHOT_CHECK All chunks referenced by snapshot SSD4_Test at revision 8 exist
2025-07-19 19:50:41.956 INFO SNAPSHOT_CHECK 
      snap | rev |                          |  files | bytes | chunks | bytes |   uniq |    bytes |    new |    bytes |
 SSD4_Test |   1 | @ 2025-07-17 23:11 -hash | 250590 | 1903G | 367734 | 1721G |  73865 | 340,743M | 367734 |    1721G |
 SSD4_Test |   2 | @ 2025-07-18 20:00       | 240274 | 1444G | 293931 | 1389G |     15 |  41,827K |     62 | 115,471K |
 SSD4_Test |   3 | @ 2025-07-18 21:00       | 240271 | 1444G | 293932 | 1389G |     16 |  63,161K |     16 |  63,161K |
 SSD4_Test |   4 | @ 2025-07-18 22:00       | 240275 | 1444G | 293934 | 1389G |     25 |  67,986K |     25 |  67,986K |
 SSD4_Test |   5 | @ 2025-07-18 23:00       | 240271 | 1444G | 293935 | 1389G |     35 |  72,713K |     35 |  72,713K |
 SSD4_Test |   6 | @ 2025-07-19 17:01       | 240276 | 1444G | 293939 | 1389G |     12 |   8,361K |     52 | 108,741K |
 SSD4_Test |   7 | @ 2025-07-19 18:32       | 240278 | 1444G | 293938 | 1389G |      3 |   3,822K |     13 |  20,916K |
 SSD4_Test |   8 | @ 2025-07-19 19:22       | 240278 | 1444G | 293940 | 1389G |      5 |  11,291K |      5 |  11,291K |
 SSD4_Test | all |                          |        |       | 367942 | 1722G | 367942 |    1722G |        |          |

And here is what MacOS thinks of the source volume:

df -k /Volumes/SSD4
Filesystem    1024-blocks       Used  Available Capacity iused       ifree %iused  Mounted on
/dev/disk10s3  3906813744 1516558032 2389781844    39%  320157 23897818440    0%   /Volumes/SSD4

Can you make something out of this?

saspus · 19 July 2025 22:59

(edited your post to put logs between triple backticks (```) for readability)

So, if I’m reading this correctly:

This was a first revision, and you backed up 1903G of data, 250590 files, that were compressed/deduplicated into 1721G worth of data.

Then some files were removed from the backup set (about ten thousand files), so second and subsequent backups were picking up 240274-240278 files, totaling about 1444G, 1389G after compression/deduplication:

To find out which files were excluded/removed from the second and subsequent revisions the easiest would be to initialize a temporary duplicacy repository in the new empty folder, with the same snapshotID and pointing to the same storage, and use commands like duplicacy -list -r <revision> -files and duplicacy diff -r 1 -r 2 using duplicacy CLI.

Something like (add -e flag to init command if you used encryption)

mkdir /tmp/test
cd /tmp/test
duplicacy init SSD4_Test /Volumes/Duplicacy 
duplicacy diff -r 1 -r 2
...
# when done 
rm -rf /tmp/test

Or, if you know that you deleted that chunk of data on purpose – you can remove the first revision from backup – duplicacy prune -r 1

It’s probably possible to do all that in web ui too, but it would feel like fighting windmills to be honest.

Dinsdale · 20 July 2025 13:31

Hi @saspus, since it was testing only and I had stopped and restarted the initial backup I just threw away the entire backup and started again. Now I see a backup that is smaller than the original, albeit only 7%. I am trying some more compression options to see what works best. Thanks for your help!

saspus · 21 July 2025 00:59

I’m actually surprised you even got 7%. If the data is predominantly media, it’s already compressed as much as humanly possible, especially lossy formats. So unless you have a substantial amount of compressible media — like maybe old camera raw files, uncompressed video, etc, — I would not expect any meaningful compression.

Also for media consider copying data as-is to object-lock supporting/write-only storage: media files are not compressible very well, not deduplicatable and does need to be versioned — any change is a corruption that shall be prevented. Using deduplicating compressing software on top just adds complexity and risk for no benefits in return.

Dinsdale · 21 July 2025 12:18

I get 30% compression on my main imaging archive which holds mostly FITS images (an astrophotography format) and Canon RAW files. Those FITS files are easily compressible, zip compression is around 60%.