Best Practices for Backup Strategy

Spudster · 25 April 2020 21:30

Hi. I’m planning my initial backup strategy. I’m backing up large video and audio files, about 20TB of these.

What are the best practices for this file type?

How should I set up the jobs?

What are the requirements for the local database- how much storage should I allocate for Duplicacy?

Are there any tips for the initial backup? For instance, “don’t reboot until the initial scan is completed”.

Also, any best practices guides and white papers would be appreciated.

Thanks.

Droolio · 25 April 2020 23:05

Duplicacy is great for versioning of data. It can chunk, de-duplicate, compress and encrypt your 1TB of data quite well and can keep a snapshot history.

For large media though - personally I’d use a tool like Rclone for copying/synchronising to cloud storage or just a file sync tool for local/external storage.

Local database? Duplicacy doesn’t really use one; you only need space for storage on the backup destination.

Initial backup can be interrupted but it’ll take longer to restart. You won’t lose much bandwidth by aborting a backup, but if you have a lot of data to backup and a slow uplink, you could reduce the initial size by moving files/folders temporarily out of the repository until it’s complete. Then move more data in and run another backup. Rinse repeat.

Spudster · 26 April 2020 01:29

I need backup not rsync.

It has to have a local database- unless it’s stored on the target which would be ridiculous.

towerbr · 26 April 2020 02:33

Duplicacy in fact does not use a local database, and it is precisely this design that makes it so reliable, not presenting the various problems that other tools that use databases present.

Spudster · 26 April 2020 05:25

Thank you for the link.

Ok. Wow. So every time a backup job is run it has to do a gazillion reads on the local storage and the remote storage?

towerbr · 26 April 2020 13:06

Exactly, but since this information about the files is obtained by API calls, they are insanely fast.

Take a look at this example: a 22 GB storage on B2, a backup generated for 11 MB of new files, took 1min40sec.

INFO BACKUP_STATS Files: 18883 total, 22,137M bytes; 47 new, 37,209K bytes
INFO BACKUP_STATS File chunks: 20025 total, 22,471M bytes; 11 new, 10,828K bytes, 9,000K bytes uploaded
INFO BACKUP_STATS Metadata chunks: 10 total, 8,381K bytes; 5 new, 6,736K bytes, 2,427K bytes uploaded
INFO BACKUP_STATS All chunks: 20035 total, 22,480M bytes; 16 new, 17,565K bytes, 11,428K bytes uploaded
INFO BACKUP_STATS Total running time: 00:01:40

In other words, reading information about the 18,883 files already stored is very quick.

A little correction here. I think you mean a local repository. Take a look:

Droolio · 26 April 2020 16:57

Rclone isn’t rsync. (Although I guess it does similar things.) However, you can ‘backup’ your large media files more efficiently by a simple Rclone copy or perhaps a sync with the --backup-dir flag.

Anyway, you asked for best practices and I gave you what I consider a better practice…

Duplicacy is fantastic for most user data (and, ridiculously, doesn’t have a local database) but the process packs files into chunks on the destination. This makes it cumbersome to access directly, and you likely won’t benefit from de-duplication or compression.