Yes, that’s about right. GUI gets the simple jobs done and is probably enough for most people, but if you want finer control – CLI is easier, because you don’t have to fight the middle men, and figure out how to tell GUI what you want to tell CLI. In addition, for personal use CLI is free, so there is that.
Yes, sort of. Rclone is a sync tool (no vessioning, beyond rudimentary "copy different file to another folder), Duplicacy–a proper versioned, point-in-time, backup, with configurable version history. You need backup, so use duplicacy. But since most of your files were media – I suggested rclone as an alternative, since you don’ need versioning for those files.
But if you want to keep it simple – use duplicacy for everything, including media files. it will work just fine.
See, this is one of the things that CLI supports but is not exposed in GUI. Duplicacy has a duplicacy benchmark
command where it can benchmark storage.
Unfortunately, this may not tell you the full picture. If you storage is empty, benchmark will upload a few test files and check performance, and it will be fast; but once your storage fills up, and the target contains hundreds of thousands of files, enumerating those file scan be very slow on OneDrive, as it was designed for a different usecase. Increasing average chunk size can help here too, by reducing the number of chunks required, at the expense of slightly worse deduplication. But it will be net benefit in the end.
I dislike wasabi for their predatory pricing. They claim it’s a “hot storage” at simple pricing but in reality there are gotchas:
- You will pay for minimum 1TB regardless of how much you use
- If you delete a file before storing it for 3 month you will pay a fee equal to cost of storing it for 3 month.
- And if you download more than what you store, they can ban your account. Not a nice prospect.
It benefits from having started earlier.
Dont’ worry about crypto tokens. Storj runs a network of independent node operators located all over the world, who contribute underutilized resources on their servers and get compensated for doing so. Figuring out tax codes and international payments is insurmountable task, so they’ve created a utility token on etherium network to facilitate the payments: They calculate payment in USD, convert to tokens, and pay operators. Operators can convert tokens to their local currency and deal with local taxes themselves.
As a customer, you can pay for storage with a plain old credit card. But they also accept their own tokens, because it would have been silly if they did not
Can’t agree more. I’ve discovered storj few years ago, and using it for everything now, it’s awesome. It’s way underpriced for what it is. geo-redundant storage that behaves like a world-wide CDN at $4/TB !? come on! If you want geographic redundancy with B2 or amazon – you have to pretty much pay double.
Good point. I was a big Microsoft services aficionado in the past, but they keep going downhill since that office365 rebranding, so I’m no longer using nor recommending them. Google Docs/ Spreadsheets and Apple Pages/Numbers more than fullfill my needs (and I was quite a power user of Excel) so paying for office subscription when you can have same or better tools for free seems counterproductive.
Right. Their service talks their own protocol, but they also run separate servers that “bridge S3 to their own protocol” so customers that have software that already supports (the defacto standard) s3 protocol can easily switch to storj with their existing software.
The web UI is really useful just for small tasks, they are intended to be used via the storage API, by other apps, like Duplicacy, rclone, etc.
At each backup as a command option. If you put it to “global” options duplicacy won’t like it, if I remember it correctly.
Awesome!
Very good question.
Some filesystems have built-in checksumming support, and can verify if the file got corrupted, and even restore if from a redundancy information if that is available. Systems that run such filesystems often run a periodic task that reads every file, compares checksums and repairs any rot. This is usually feature of a NAS appliances. General purpose home machines, including Windows, macOS, etc, don’t have that capability, and you are absolutely right, if you keep an old photo on an HDD in a 7-10 years it will likely rot. On an SSD – even faster, especially if SSD is not kept powered.
Technical term is bit rot
This is a very valid concern that many overlook.
Duplicacy cannot distinguish file corruption vs deliberate change. So it will see - -oh, file changed, let me backup this new version – and indeed back it up as a new version. Now if you notice at some point in the future that oh, this photo got corrupted – you should be able restore the older, uncorrupted version from the backup. That’s where long version history comes handy and why for something to be called “backup” it has to have a long version history.
Keeping data for just 1800 days is not enough. It’s just a default, you can then manually edit the command line (the -keep options) to specify any other suitable retention. A common approach is to:
- keep all backups made in the last two weeks
- After two weeks, keep a version every day
- After 90 days keep, a version every week
- After one year keep a version every month
If data does not change – new versions don’t occupy any extra space. The keep
parameters for the prune
command to implement this scenario may look like so: -keep 31:360 -keep 7:90 -keep 1:14 -all
By the way, the same concern is valid with the respect to target storage. If, for example, you are backing up to a local USB HDD – it too, can rot. And corrupt the chunk data that duplciacy write, as a result you would not be able to restore some data.
Duplicacy offers a feature, called “erasure coding” where you can configure it to write chunk data with redundancy, to be able to tolerate occasional bit-flips, but it’s not a panacea. Ideally, you want to avoid backing up to the isolated hard drives, as they cannot provide data consistency guarantees. Cloud storage providers – usually do. Storj, for example, encrypts all data by default with the user-generated keys, so if data was to be corrupted it would just simply fail to decrypt; therefore by design it can’t return you bad data.
Someone suggested me a cardinally different approach for archiving family photos and videos – a low tech solution that involves Blue Ray M-Disk with stated 1000 years retention. That’s something to consider.
When you init the storage you have an option to encrypt it (see the -e
flag in the init command. Then to both, do backups, and do restores, you would need to provide that encryption key. Without that key data just looks like noise and unreadable by anyone. (Duplicay can use keychain on macOS and Windows, so you don’t actually have to specify it every time of course)
For advanced scenarios duplicacy supports an interesting feature: RSA encryption. it allows you do generate a key pair, and duplicacy will use the public key for backup. To restore, however, you will need to provide private key. This is a useful scenario if you backup from multiple computers to the same storage, to take advantage of duplicacy’s cross-machine de-duplicaton (a major selling point) but don’t want users to be able to restore each others’ data.