Multiple backup to one storage - is this safe?

Arty.R · 12 June 2019 15:49

Or better make separate storage for each backup ?

TheBestPessimist · 12 June 2019 16:01

Multiple backup to one storage

This is how you should be using !
If multiple computers don’t backup to the same storage, you will lose quite a lot of the deduplication potential.

Arty.R · 12 June 2019 17:08

I meant another
Separate backups jobs (from other folders) - with use one storage as destination.
Now i use one storage only for one backup destination (with many versions), another folders backups go to each separate destinations.
I make this and it’s working - in snapshots folder i see new folder with backup name - but chunks will not overwrite?
Or is it better not to risk it and to make each archive separately?

Sorry for my wrong English - but is not my primary language

saspus · 12 June 2019 17:35

Yes, send them to the same destination to take advantage of deduplication.

Not only does duplicacy support dedup of data coming from various folders, but also across data coming from different machines, as @TheBestPessimist pointed out; this is in fact a core killer feature that very few backup tools support.

In fact, the design does not use the knowledge about where does the data come from: it gets shredded into chunks and referenced in snapshot file. If chunk already exists – it will be re used. It does not matter how that existing chunk got to be there; information about its origin is not stored and not used.

Here is a description about how it works: Lock Free Deduplication · gilbertchen/duplicacy Wiki · GitHub

Arty.R · 12 June 2019 17:43

Very interesting
I really don’t realize how it possible to exists identical chunks from absolutely different datafiles.

saspus · 12 June 2019 17:45

And this is precisely how and why it works – chunks with the same name contain by design the same data. Chunk’s name is the hash of the data it contains. So, if they overwrite – no harm. In fact, this is how deduplication works (technically, there is more to that, but enough for broad explanation) – if the backup run is about to upload a chunk and it is already there - awesome – no need to upload anything. It’s already there.

saspus · 12 June 2019 17:52

You are right, it’s not possible. But if datafiles are not absolutely different, there is a possibility that they may contain similar parts. And that’s where it helps. Media files and zip archives are non-deduplicatable and non-compressible by nature – if it was possible to compress them further – it would have been done. But vast number of other files are.

And consider another use case – you backup /Users/greg as one repository. Then you also backup /Users/emlily as another. And then you backup /Users as well, because you are nice admin and care about your users. All three backups run at different or same schedule and have different retention/pruning settings. But data from these three backup jobs completely overlap!. And that not to mention that Greg and Emily have shared photo album with literaly identical files inside. So, there would be a lot of shared chunks and likely when you backup /Users after /Users/greg and /Users/Emily completed their backup no new data will be uploaded – because its already there.

Arty.R · 12 June 2019 17:52

Thanks.
I understand that my whole backup scheme is wrong.

cyrond · 27 July 2022 13:30

@Arty.R just for followup: did you figure everything out?

willypo · 13 March 2024 06:42

Is there an ideal way to bring many different backups - including historical snapshots - together into one new storage (to get all the advantages described in this thread)?

Meaning, instead of:
backup1 → storage1
backup2 → storage2
backup3 → storage3

I would prefer:
backup1 + backup2 + backup3 → storage4 (and then purge the old storages)

I assume this is going to be some combo of copy/add commands but it seems like you can only make copy-compatible with one storage in the Web UI, not with multiple (?)

saspus · 13 March 2024 08:19

Copy compatibility is mutual. If storage a is compatible with storage b then storage b is compatible with storage a.

In reality copy compatibility means that some chunking algorithm parameters are same.

So, if you have three storages not copy compatible to one another, there cannot exist fourth storage, compatible with all three.