Best practice for mixed requirement backups

Hello,

I’m wondering what the best practice for “mixed requirement” backups would be.

For example my current setup contains “mission critical” data like family albums, personal files, source code, etc, where I’m doing a BACKUP to an External HD and doing a COPY to Backblaze B2, and unless space ever becomes a problem, I have no plans to ever PRUNE either storage location.

At some point I’d also like to backup some less important data, for example a CD collection ripped to FLAC. At this point in time I don’t want to back up to the External HD, but I would like to back it up to B2, and unlike the “mission critical data”, I only ever want to keep the most recent revision of this data so will be using PRUNE.

If I understand what I’ve read, I don’t think it will be an issue to have one B2 storage location that is accepting both a COPY from the external HD as well as a BACKUP of the CD collection. Nor will it be an issue to specifically PRUNE the CD collection without also pruning the “mission critical” data.

But I’m wondering if there would be any drawback to creating a second B2 bucket/storage location for the CD collection backup (and eventually additional backups that also don’t need revision history)? I understand there wouldn’t be any deduplication between the buckets/storage locations, but the type of data going to each location is unlikely to benefit from that anyway.

Thanks,
Rick

I think your understanding of the scenario is pretty accurate although, I would say if there’s isn’t going to be much de-duplication, why not just have a secondary storage/bucket on B2 anyway?

I mean, you could always decide later to split or join the CD collection (provided the secondary storage was created with add -copy as a copy-compatible storage), although you’d incur transfer costs in doing so.

Personally, because of the data type and size of files, I’d consider using Rclone for such media. Otherwise, a Duplicacy storage fixed-size chunks (non-copy-compatible) might be a bit more efficient.

3 Likes

Thanks for the suggestion. One reason I was thinking to use Duplicacy is I sometimes reorganize the CD collection, so something like rclone would handle a rename by uploading the “new” file and deleting the “old” file, whereas I believe Duplicacy would have minimal or maybe even no new data to upload.

Another fairly common change is to edit the tags, because we use a custom tag to identify which family member(s) want the song in their playlist, and taste in music will change over time. In this scenario Duplicacy would have to upload data to the server, but it should only ever be the first block if I use your suggestion of fixed-size chunks (I use foobar2000, and it pads the tag section out to 4KB, so unless I exceed that limit it shouldn’t ever have to change the size of the file and cause subsequent chunks to change), whereas rclone would have to uplodad the whole file.

I guess it would be good to do a small-scale test to see exactly how Duplicacy would handle these scenarios. If it’s not a huge savings then it may be best to keep things simple and just mirror the collection with something like rclone like you suggest.

Ah in that case Duplicacy could well make sense. I still use Duplicacy for my music collection coz most of it’s in .mp3 format and occasionally I might move, rename and re-tag stuff. I have only a handful of FLAC.

The reason I mentioned fixed-size chunks is my imagining of monolithic FLAC files as being akin to .vmdk or virtual disk images - but then FLAC can be separate tracks too and I’m not sure if the format is deterministic in terms of binary identicality or padding (even if they are lossless).

So you mentioning the tag padding makes me wonder if variable-size chunks is more suitable anyway. With fixed-size - if the tag is at the end of the file it wouldn’t be much of a problem. If at the beginning, indeed, all subsequent chunks may shift up and may need to be re-uploaded. Variable is suitable for both.