Confused about something I did

Hello! 2 questions. I am new, but want to do this right.

My setup in brief… I have one backup location currently
b2:MyBucket

[Question 1]
I did this initially
Duplicacy → Backup → Directory “D:\BigFolder\MP3s” → B2:MyBucket

Then I decided I wanted to actually just backup “D:\BigFolder” → B2:MyBucket

So now there are 2 options in Duplicacy (WebUI) → Backup:

  • MP3
  • BigFolder

(I can also see on ‘B2:MyBucket\snapshots’ that is has both the MP3s and BigFolder listed.)

If I delete/remove the backup ‘MP3s’ through the Duplicacy (WebUI)… will the files actually still be in the B2’s ‘chunks’, still anyway since they were also backed up with ‘BigFolder’?

[Question 2]
I read that the files are hashed based on the path name. Does this mean it is ‘expensive’ in operations (cost B2) because all those hash chunks should then be re-hashed with their new paths?

Basically what I am asking here: Should I avoid moving files around a lot within the local folders because the cost on B2 would be higher?

This is an interesting question: does duplicacy web UI run prune when snapshot_id is deleted? And does it do it in resumable way? Summoning @gchen

If it does not — you can clean up the datastore manually by running prune -exhaustive. This will delete all unreferenced chunks.

For your usecase this would be irrelevant because you are not removing any data: data in your mp3 folder was backed up before and is still backed up now. There would be nothing to delete (minus overhead).

Yes. If chunk is used by any backup it will not be deleted. similarly, if you already have backup of BigFolder adding another backup job just for BigFolder/mp3, while pointless, will not increase amount of data stored (again, beyond trivial amount of overhead).

No. Move stuff around as you please. If you see dramatic increase in storage utilization that would be a bug in rolling hash chunking implementation.

I suggest reading the duplicacy design document that explains how deduplication works well: Duplicacy paper accepted by IEEE Transactions on Cloud Computing

If a backup with is deleted from the Backup page, nothing is done to the storage. Old backups will remain in the storage. If you want to remove them completely you’ll need to run prune with the -id option to specifically target them.

Thank you for your reply! And to you too @saspus