Few comments.
You mean you were using native integration and switched to using s3 gateway?
Native integration can provide MASSIVE performance, vastly exceeding that of S3, because it communicates directly with nodes. It also provides end-to-end encryption, the key does not leave your machine.
The drawback is that you need MASSIVE amount of compute power and upload bandwidth – because of a lot of compute required for shading and encoding, and 2.5x upstream amplification. With 1Gigabit upstream you can expect to have max 400 Mbps of useful data upload, by design.
On the other hand, if you had unlimited upstream bandwidth – you could reach crazy speeds. This is however is not needed for backup, so it’s a good choice to use S3. Having limited up stead connection is another good indication to use S3.
If you want to use S3 and still maintain end-to-end encrytion you can run your own storj S3 gateway on the cloud instance you control. But this is completely unnecessary, because duplicacy supports encryption itself.
To get even better performance, increase duplicacy chunk size. Default is 4MB. The closer you get to 64MB the faster everything will work and the more you save on segment fees.
With S3 you are communicating with the gateway, that communicates with storage nodes on your behalf. This means gateway has to have your passphrase (as part of access grant) which it protects by S3 secret, that it also knows. So the data no longer end-to-end encrypted, gateway can theoretically see it.
Turn this off. It’s 100% waste of money. Storj encrypts everything, so if anything is corrupted – it will fail to decrypt. You either get your file back – or nothing. Storj cannot return bad data by design. It actually itself uses erasure coding to store data in 80 shards, 29 of which are necessary to fully recover the data. So another layer of erasure coding is 100% waste.
Incremental backups transfer so little data, so that it does not matter how fast they go. You can backup every 15 min if you want – but you should not. If your data changed every 15 min – you are probably better off with source control and/or other project specific tracking tools. There is no need to do backup to often. But you can, if you want to, of course.
On the contrary. If disaster happens – you don’t need to restore everything immediately. You’ll restore what you need today. And if it takes 2 months to restore the rest – why is this a problem?
Since you are ARQ user, you are probably using Glacier Deep Archive to backup to – and you already know that getting stuff back FAST – is EXPENSIVE, but if you are not in a hurry – quite cheap.
This is not the case with STORJ – the rate is flat, but being so preoccupied by the speed of restore and backup is a bit missing the point. Backup is a by definition background task, that provides insurance. You never expect to need it, therefore focusing on performance of that usecase that must never happen is strange. What is important – is how little it affects your other, more important tasks.
I’m, on the contrary, running duplicacy under CPU throttler, because I don’t want it to go full speed and finish backup in 5 mins, fans blaring. I want it to work in the background, slowly, without any impact. The fact that duplciacy is very performance helps me throttle it deeper, and still manage to backup daily. I also use Arq, also throttled via its own CPU slider. I also use Time Machine, which is already throttled by apple. Each backup takes 5 hours. And I"m fine with that. It’s in the background and does not affect me in any way.
I don’t use web UI, but I remember discussions on the forum about health heck, you may be able to find it…
Check without parameters only ensures that chunks file manifest refers to are still present. It does not check integrity of the chunks (for this you need add -chunks flag) nor restorability of files (for this you need to add -files flag; in both cases it’s very expensive). So it essentially checks datastore consistency, and protects against duplicacy bugs resulting in mismanagement of chunks lifecycle.
Another job if check is to generate statistics files that Duplicacy Web uses to display those plots on the dashboard. So if you want to see plots – you’ll have to run check periodically.
I don’t run neither check not prune at all. However it’s a good policy to periodically try to restore some small subset of data just to ensure that the backup still works – both from technical perspective, and from admin perspective – do you still have access to to keys, passcodes, etc.