Newbie help (go easy on me please...)

phillipmcmahon · 29 May 2023 16:24

So before I go committing data to wasabi I just wanted to check my process (in theory) is right…

I have 2 data sources and want to back them up to the same local and the same wasabi-backed storage location.

I do the following;

data source 1
#cd to data source 1
duplicacy init data-snapshot /volumeUSB1/usbshare/duplicacy
duplicacy add -copy default -bit-identical wasabi-storage wasabi-data-snapshot wasabi://eu-central-2@s3.eu-central-2.wasabisys.com/BUCKET-ID/duplicacy
duplicacy backup -stats
duplicacy copy -from default -to wasabi-storage

data source 2
#cd to data source 2
duplicacy init data2-snapshot /volumeUSB1/usbshare/duplicacy
duplicacy add -copy default -bit-identical wasabi-storage wasabi-data2-snapshot wasabi://eu-central-2@s3.eu-central-2.wasabisys.com/BUCKET-ID/duplicacy
duplicacy backup -stats
duplicacy copy -from default -to wasabi-storage

Is the proper/best/correct way to achieve what I am doing? I figured by using the same local storage location I get the most optimum deduplication chances, then once that that has been completed a “simple” copy to the cloud location gives me some off-site security.

Thanks in advance!

saspus · 29 May 2023 17:07

What you propose will work.

Few comments here:

You don’t need to add -bit-identical flag. But if you do, you don’t have to use duplicacy copy — you can just rclone your source to wasabi, and, because for this you don’t need to have decryption keys, this can be done anywhere on any other machine.
You are backing up to the local USB drive; storage, that does not guarantee data consistency nor integrity, and then copy that to wasabi, that does. See the problem? I would simply backup to both destinations separately.
If you have to use the usb drive as an intermediary — then do use duplicacy copy with not bit-identical storages; in this case duplicacy will have to decrypt and repack data, discovering any potential rot. However, this seems to be a local usb drive, and you are going to be running copy on the same machine — so just backup to two separate destinations. Ultimately, I would just not backup to the usb drive in the first place: it’s connected to the same machine and is less reliable than your source data. Does your source filesystem support snapshots? You can use that instead for local version history.
Wasabi storage is expensive per TB, they charge minimum 1TB worth of storage, and there is early deletion fee if you store any piece for less than 3 months. (It’s also not very stable in my experience, but that can be anecdotal). And they can ban your “free egress” account if you egres more than the amount of data stored. I recommend Storj or Backblaze B2 instead as cheaper, more reliable, and not annoying alternatives to wasabi for hot storage. Storj is decentralized, very high durability, and end to end encrypted by design (and is heavily underpriced, but that’s another topic).
Are these two sources accessible on the same host and require the same retention rules? You can create a folder, symlink your two sources to that folder, and have duplicacy backup that folder. Duplicacy follows first level symlinks so this will work. This will also avoid polluting your sources with .duplicacy folder (even though there are other ways to accomplish that) and you will only have one backup and copy job to manage. Simpler == better.
Nitpick: I would not use “default” because it’s ambiguous. Name each snapshot id explicitly.

phillipmcmahon · 29 May 2023 17:35

Many thanks for the quick and comprehensive answer. Appreciated.

Understood.
Local USB makes it reasonably handy for restoring data at volume. It’s definitely not as secure as my NAS (108TB configured in RAID10), but it’s mostly media files for my Plex that I can easily source. The only media I like to secure is my ripped music collection. Yes, I’m over 40 I was following the advice in the wiki, that recommends exactly against doing a backup twice, do it once and then take a copy, it cited changes made during the secondary backup process, meaning the backups are almost never in sync.

Can I sync down from a Cloud location to my USB storage?

I am in trial stage, swiftly deleted my account after reading your comments. I first went the GDrive route but quickly realised that’s throttled and of no use to man-nor-beast for Cloud backups.
Yes they are, and that would make sense. I hadn’t really thought about retention policy at this stage…
Do I rename after the init, or can I do it in the init command?

saspus · 29 May 2023 18:11

Ah, why don’t you backup to the nas then? Or, if it is the data on the NAS that you backup (the volumeUSB1 looks like some NASes would create" – does your nas support snapshots? You may then want to just create a series of snapshots – they are very cheap and provide a local way-back machine. Probability that (properly designed, cooled, and powered) RAID10 loses data is significantly smaller than that of a single USB drive (costs cut, horrible thermals, atrocious power)

Since these are by nature immutable, incompressible, unique files, you don’t really need a version history, or strong encryption, or deduplication, – pretty much everything duplicacy is good at :). Instead, you can rclone your media directly to the cloud. With some providers you can tweak the security for access keys that will allows you to copy upload, but not delete or modify, so that if your local version rots, it won’t overwrite good cloud version (essentially, keeping your media always at version 0)

LOL, same, and also have a massive ripped CDs collection

Three comments here – 1) why do backups need to be in sync in the first place, 2) you can start them at the same time, then they’ll be in sync.

The proper way to do backup is to create a local filesystem snapshot first, and then backup this snapshot, to uncouple from changes to the filesystem happining while backup is in progress. Duplicacy does it automatically on Windows and macOS (see -vss flag) but for all other OSes you’d need to script that yourself. In this case, you can have perfectly synchronized backups by backing up the same snapshot:

create snapshot fs-temp
mount it to /mnt/fs-temp
backup /mnt/fs-temp to storj
backup /mnt/fs-temp to usb
unmount the fs-temp
delete fs1-temp

The overhead here is that you essentially do the job twice; but duplicacy is fast, so why not? I don’t necessarily have a problem with copy, but I do have a big problem with single drives, and USB drives in particular. Furthermore, copy is also quite resource intensive and it has to decrypt and re-encrypt data. So the difference becomes filesystem scan and compression. Compression is very fast, and second filesystem scan will be instant as everything would be cached in ram after the first scan. So it’s not that bad.

You can, but I wouldn’t – most storage providers have non-zero egress fees. Actually, the better the provider is suitable for backup – the higher egress fees it has, because all optimizations are done for retention, and not turnover, unlike hot storage. The best one out there is Amazon Glacier Deep Archive: $1/TB to store, but 180 day min retention, and restore of data over 100G/month threshold is from every expensive to exorbitant, depending on how fast you want your data back. Unfortunately, duplicacy does not support that kind of storage, but it may be perfect location to sync your media library to (I’m talking not just CD rips, but family photos, videos, etc). Something that does not change and you never expect to need to restore. But if you ever do – well, then the restore cost does not matter.

Google drive is fine, it’s one of (the only?) *drive type service that kind of works in these scenarios. They however started enforcing quotas recently so the trick of getting unlimited storage for $12/month is no longer works. And yes, latency is huge, and performance is not so good – but for backup the should not matter. But I agree, [ab]using drive service as object storage replacements is not sustainable.

I think init has an argument (–storage-name or something like that)

phillipmcmahon · 29 May 2023 18:30

I take snapshots on my Synology, but planning for the inevitable to happen at some point in the future. Some type of failure meaning I can’t get easy access to the data anymore locally.

The only bulk media I backup at my music files, other Plex content I don’t care about. The rest of my backups consists of personal files, documents, etc.

Re “backups in sync”, I guess they don’t have to be, perhaps the guidance in the quick start guide needs a refresh as it’s quite clear that “best practise” is to backup once, then copy elsewhere if you want to propagate that exact backup.

I would need to look into the command line options for taking and mounting snapshots on my Synology. I like the idea, hopefully it’s possible. If you know of any sources for such information, please do share.

Thanks again for your help. Extremely useful!

saspus · 29 May 2023 18:47

Right, this is what the offsite backup is for

Then it’s probably makes sense to indeed backup everything with duplicacy, and avoid the overhead of managing another sync solution.

Yes, it’s possible if your filesystem is btrfs. The best unbiased source I know of is man btrfs :D. The “shares” Synology presents are in fact btrfs subvolumes, so you can snapshot them directly, e.g. btrfs subvolume snapshot share-path share-path/snapshot-name. Synology has an option to make snapshots visible in a separate folder, or you can mount them manually elsewhere.

phillipmcmahon · 29 May 2023 20:16

Seems super easy.

Created a snapshot holding folder on my btrfs fs
Once in the holding folder, take a snapshot - btrfs subvolume snapshot -r /volume1/homes @snapshot-homes
Run duplicacy on that, out to my newly created storj accounts (thanks for the tip)
Once completed I can bin the snapshot

That feels a lot cleaner than before.

Just need to script it up and ensure I’ve not made any obvious mistakes…

EDIT - bonus being I can snapshot my other subvolumes to this new subvolume, then use a single duplicacy job to backup the lot. Thanks for your help and pointers.