Partial revisions for long backups

alind · 15 March 2024 22:45

Hi all!

I have slow and flaky internet (thanks Comcast) so large backups can take several days. This makes the probablitiy of interrupted backups very likely.

Unfortunately I also have slow and low-powered NAS machines where chunking/hashing takes a significant time, even if the data has already been backed up and all chunks exist in the destination.

This combination makes it very difficult to finish large backups or even make any progress before the next internet interruption.

It would be really great if Duplicacy could upload a partial revision, containing only the as-of-yet backed-up files at some configurable interval (default perhaps 1 hour). This would allow an interrupted backup to continue at the last partial revision.

Thanks!

Extra details:

I was running a long backup on a Windows machine when suddenly the -vss shadow copy disappeared. This made the backup finish early and contain only the as-of-yet backed up files. The next backup happily continued from the partial revision and I saved a lot of hashing time.

A feature with the exact same behavior seems like it would address my issue.

If I’m not mistaken, such a feature could also alleviate the issue of a prune removing blocks for a backup lasting longer than 1 week.

Extra extra details:

What I have to do now is exclude subfolders of my backup, finish a backup, then remove exclusions one at a time. If a folder is too large to finish in one backup, I then have to manually recurse into that folder and repeat the exclusion process.

crazyb56 · 14 May 2024 21:58

Seconded. This perfectly describes the issues I have encountered. Checkpoints would be really nice.

I have a very stable connection, and I have a pretty performant storage solution on the recieving end, but the bandwidth available isn’t the fastest and I still have issues often. My system means that I can only backup overnight, and with the speeds available it has issues with a growimg amount of data it has to recheck. Doubly so if I am adding more data to backup as the days go on. I backup only overnight so if I add more data every day the problem just continues to grow as the ammouint of data needs to be read from the destination HDDs against the uncompleted revision until it is able to catch up.

A more extreme example is the initial seed of 13TB. On a local network, 2 unraid boxes so 150 MB/s HDDs on either side and 1 Gbps network available. The problem was one side kept rebooting every 8-36 hours. So lets say it’s 10TB in and it reboots. Now the system has to re-read all 10TB to verify the chunks, that would take over 18 hours before it would even start making progress on the backup. Maybe it would make it maybe not. It took weeks to backup what should have taken less than 30 hours. And I know it would have been impossible or take years if it was off-site.

There’s a bunch of ways they could go about this with various levels of complexity.

I’m imagining a cli switch or something that is used for getting inconsistent or very large backups up to speed. Make it based on the ammount of time, chunks, files, folders, anything. Even if it is arbitrary.
Just have it slowly increase the files/folders included in the partial(s). Even if it is not 100% immutable that’s fine.
Even a switch that essentially backs up via uploading unreferenced or fossilized chunks which could then be re-integrated into the normal backups automatically next time a normal backup is run.

I don’t think there is an easy way to do this though. To implement this I think it’s either a very large addition/rework of the backup system. Or it’s a secondary system singularly built to handle these specific types of backup jobs.