Migrating storage to smaller avg chunk size

Hi!

I’ve been using Duplicacy CLI for a while, but didn’t think much about the proper storage design. Now reading through the forum I understand that for generic personal use case (mix of documents and photos mostly) deduplication is more efficient with smaller than default chunk size.

I have my primary backup storage on local NAS box and I copy it with :d: to wasabi bucket. Both storages are encrypted and have variable chunk size with default average (4mb) and hold a number of repositories. Current size of storage folder is ~400GB.

Can you please help me with these questions:

  1. Does it make sense to migrate to new storage where I set 1mb average chunk size?
  2. Is there a way to assess deduplication effectiveness for my particular data set, so I could compare 4mb and 1mb chunk?
  3. If I do the migration, I was planning to use something like this to init my new storages:
    duplicacy add -e -c 1M -copy "original_storage_name" "new_local_storage_name" "my_repository_name" "new_local_storage_url" duplicacy add -e -c 1M -copy "original_storage_name" "new_cloud_storage_name" "my_repository_name" "new_wasabi_storage_url"
    And then run copy operation for all my repositories, first to local storage, then cloud.
    Is it the optimal way to do this? Or I need to change my approach?

I probably wouldn’t bother if I’d be you. With 400GB of the data you describe, I’d guess that most of these (by total size) would be photos. If that’s the case and you don’t edit/tag/move around these photos all the time, you won’t see much of a difference, if at all.

The usecase I can think of where smaller chunks may make sense is a lot of small(er) files AND you change or move them around a lot. Even then I don’t know if you’d see significant difference in space utilization.

Yeah, maybe you are right. Do you know any way to check how well deduplication works for my data? My only idea is to make a couple of backups to new storages with different chunk size and compare…

Have you checked check tabular logs?

Yes, but I’m not sure how to read it. If I look at the last few revisions, I see that “New” volume could be big, but “Unique” is smaller significantly. Does it mean that it’s deduplicated well and added only Unique chunks?
Also, the last row for “all” revisions says that unique chunks are about 50% of total chunks. Does it mean I have 50% deduplication rate?

snap | rev | | files | bytes | chunks | bytes | uniq | bytes | new | bytes |
Fractal-Home | 467 | @ 2022-12-01 20:55 | 71215 | 94,785M | 16532 | 79,003M | 12 | 8,527K | 30 | 134,217K |
Fractal-Home | 468 | @ 2022-12-01 21:55 | 71224 | 94,785M | 16537 | 79,014M | 15 | 9,699K | 20 | 25,434K |
Fractal-Home | 469 | @ 2022-12-02 06:55 | 71210 | 94,670M | 16528 | 78,951M | 14 | 10,703K | 68 | 166,051K |
Fractal-Home | 470 | @ 2022-12-02 09:55 | 71225 | 94,673M | 16530 | 78,947M | 9 | 7,127K | 25 | 49,287K |
Fractal-Home | 471 | @ 2022-12-02 10:55 | 71233 | 94,673M | 16533 | 78,959M | 17 | 26,961K | 24 | 44,675K |
Fractal-Home | 472 | @ 2022-12-02 12:55 | 71240 | 94,682M | 16537 | 78,969M | 35 | 90,295K | 35 | 90,295K |
Fractal-Home | all | | | | 32991 | 117,009M | 21166 | 63,354M | | |