Chunks vs Cloning Drive

448c9ab0976ab4be0bf9 · 22 November 2022 02:33

Can someone explain to me:

What are chunks and what is the point of saving files in this unreadable way?
Why not just copy and paste the files? (If you mention deduplication, please go into detail why not to use another dedupe software).
How does Duplicacy keep up with files that are moved?
Why should I use this over something like FreeFileSync?
What should I do regarding the risks of not being able to restore chunks due to corruption or software going away/not working. (this has happened to me before.

saspus · 22 November 2022 04:55

Few things to get you started

the paper Duplicacy paper accepted by IEEE Transactions on Cloud Computing
google search keywords:
- sync vs backup
- backup versioning
- content addressable storage
- bit rot
- Reed-Solomon erasure coding

Separately: what you think is a text file on the disk is a fiction. It does not exist. It is a collection of block scattered across the sectors with the database (filesystem) they tells you how to assemble them back into files. Think about it — duplicacy does something similar but takes advantage of content addressable storage to provide versioning, concurrency, and, as a side effect, deduplication.

If after studying the materials above you still have questions, feel free to ask

Droolio · 22 November 2022 15:04

To answer your final question - practice 3-2-1 backups. 3 copies of your data (including the original), on 2 physically separate media, 1 off-site. You should also do the occasional test restore (check will not catch bugs like this).

These two points apply to any backup software or methodology.

To answer #4, synchronising is not proper backup. Their default behaviour is to overwrite old versions. Infections by ransomware will quickly render your only backup copy f*cked, too - particularly if it’s a local file system.

#1: Chunking files disassociates the content from the metadata, enabling de-duplication between multiple computers, allowing incremental backups (only the modified parts of a file are stored/uploaded)., further benefiting from snapshot versioning. This answers #2 and #3.

448c9ab0976ab4be0bf9 · 22 November 2022 16:35

Saspus, thank you for your reply. I have Googled all of those things at your request. I read some of the Duplicacy paper and it seems that the value of chunking is to reduce storage space needed for backups. This is not a need that I have, so syncing may be a better backup option for me. If there is an issue with recovery using Duplicacy, is there a way to manually restore the data from the chunks? I know even with Windows filesystems, even with an issue recovery of files is pretty easy.

Droolio, thank you for your feedback. I think I understand those points, but you linked an article where someone cannot restore their files due to a bug. This again begs the question, why not just sync the files? I would hate to be in a situation of useless backup files when I could have otherwise synced. I am not worried about ransomware because it will be in cold storage.

Why is it a good idea to disassociate the content from the metadata? I can just keep it associated and be happy. I can run a separate deduplication software program regularly as desired. Please better help me understand the value of Duplicacy so I can purchase it on Black Friday if needed.

sevimo · 22 November 2022 17:03

You have clearly never tried to restore filesystems where metadata is corrupted. All filesystems have metadata, and if it is messed up, you won’t be able to recover much, even more so if your filesystem is encrypted. So for NTFS, see if you can recover anything when MFT and mirror MFT are corrupted. So from that perspective, there is no difference with storage - you have file data ( in chunks, local filesystems in sectors), and you have metadata ( in metadata chunks, local filesystems in metadata structures like MFT). In both cases, if your metadata is screwed, everything is basically gone.

The benefits of deduplication are not so much across files, but across time (snapshots). This is not something you can do by “running separate deduplication software”. Most setups do not have significant amounts of the same files in a single snapshot, but in most cases different snapshots that are close in time have massive overlap in data (e.g. files / parts of files that didn’t change). Unless your backup sets are trivial, you can’t create multiple snapshots without deduplication, your storage requirements quickly become ridiculous. With deduplication, you can run terabytes worth of daily snapshots if only small portions of them changing every day.

Not sure what you mean that your backup will be in cold storage. But if all you need is a single snapshot (e.g. some immutable documents) that you can put in a bank vault and don’t access it until recovery scenario, then you don’t really need , any copy will do. This is not really a backup strategy for live data though, for that any solution needs to be able to keep history over time.

Droolio · 22 November 2022 19:04

This is only one of the many benefits. Chunking facilitates all the other killer features that make Duplicacy the robost backup solution it is.

Saving storage space is rather important if you want to keep historic snapshot versions (in case you need to restore an old file you don’t immediately discover is missing or corrupt), while uploading incremental deltas (minimising bandwidth in addition to storage), while allowing location-independent (being able to rename, move files / folders without incurring storage / bandwidth penalties).

If this data is in cold storage, then it’s immutable and maybe you should be looking at archival software (although Duplicacy could fill that role too).

However, if you’re using a synchronisation tool to make copies of this data, you still run the risk of overwriting or deleting good data. Remember, sync != backup. Keeping numerous versions and multiple copies (3-2-1) of data is what true backups are all about.

448c9ab0976ab4be0bf9 · 22 November 2022 19:43

Sevimo, I have recovered files through a program called testdisk and photorec without any issues. All of the files just show up in a folder for me to enjoy. Perhaps I’m missing something. Would I be able to recover files the same way with Duplicacy chunks? I’m assuming not.

I appreciate the explanation about deduplication across “time” which has been an increasing issue for me.

Droolio,
FreeFileSync is pretty good at eliminating all risk of overwriting or deleting good data in my opinion.

My goals are:

Keep data backed up in a way that is easily searchable and retrievable for home use.
Keep all changes on my main computer “synced” (or archived?) properly on the backup drive.
When I add/remote/change files and folder locations, my backup drive keeps up with these changes.

It sounds like Duplicacy might be better than FreeFileSync for many reasons. What kind of archival software are you describing? I don’t know what archival software is for the most part, or any examples of open source archival software.

Would it be foolish to run multiple copies of data backups with multiple different copies of software? i.e. Duplicacy, Duplicati, and Borg

Q: Can I restore files if there is an issue with the Duplicacy feature? The encryption part scares me.
Q2: How robust is the included encryption? Are there backdoors? Can I use this software with something like Veracrypt? This is important if I am going to store my backups on the cloud.

448c9ab0976ab4be0bf9 · 26 November 2022 01:45

I purchased a lifetime license. Please help me with Question #2 if you have time.

448c9ab0976ab4be0bf9 · 9 December 2022 16:33

                                             ............

saspus · 9 December 2022 19:10

This one? To sync data you want sync software, and duplicacy is a backup software.

Or this one?

If the latter then I don’t know how to answer this question either.

You can read the ieee paper (search this forum) for some details on encryption scheme used by duplicacy.

Can you use it with Veracrypt? I don’t userstand the question. Please elaborate.

Yes, when you store data on someone else’s systems data should ideally be encrypted. That’s why duplicacy, like pretty much every other backup tool in existence supports encryption.

Still not sure how veracrypt fits here. It’s a completely unrelated product. Do you want to backup veracrypt containers with duplicacy? That would not be advisable for many reasons.