sftp is rarely a bottleneck over the internet. And if it is, and wireguard is indeed used end to end — there are other protocols, like NFS, available. There is no place for WebDAV in this usecase, and in my opinion duplicacy shall drop it altogether: it’s never suitable.
That’s not the reason that is applicable here, and using webdav to serve bulk storage is contrary to the purpose and protocol design. Rclone provides webdav adapter for webdav applications. Streaming, that benefits from http range requests, is one of them. But this is entirely different usecase. I’d say, completely opposite: few large files with range requests vs many small files retrieved as a whole.
It has its uses. serviving a backup target (thousands of small files) is not one of them. This is an overgeneralization.
I am so glad that you’re not the one making these calls. I am using all the things that in your opinion should be dropped from - GDrive, OneDrive, WebDav etc. Works quite well for me, but hey, none of these are ever suitable. /s
We had this conversation before. I don’t accept half-measure and “ wrong, but works, quite well” solutions. Only accept well motivated designs, that always work, scale well, and not just in specific circumstances, for specific users, for indeterminate periods of time, with poor alignment of incentives. Doing otherwise provides poor user experience.
Even though a forum is not a reflection of real world usage, you can count topics about issues with e.g. S3 vs e.g. gdrive, to gen some vague idea.
And besides, it did not work anywhere near “quite well” for one user — me. Albeit it did start promising, on a small-ish datasets in simple use-cases. I’ve suffered through it for two years, far too long in the hindsight. Those issues are not fixable, for the reasons discussed before. I have never heard of anyone who had issues with AWS S3 or GCS.
Yes, anything can be made to work, even “quite well”, but I refuse to accept this copout.
Let’s be honest, the only reason you use and tolerate high latency google drive is because it’s “free” hot storage and duplicacy does not support archival tiers. I seriously doubt that it was even a contender had price not been a factor or if archival storage was supported. That, or you have a tiny dataset, as these solitons don’t scale.
So, instead of defending misguided compromises how about advocating for the appropriate for the job scalable design and making the correct choices for the users.
Because the backup program that supports webdav among main protocols screams “I don’t know what i’m doing” or “I don’t care about user experience” or “my users shall be able to shoot themselves in the foot in the worst way possible”. The latter bit is important. Those solitons work somewhat in the beginning but then invariably fall apart, but you are too deep in.
So yes, if duplicacy dropped webdav and google drive tomorrow, some people, including you, will be pissed, some will move to another solution, and some will trust the developer and play along, migrate to the appropriate for the job technology and be better off in the long run.
But if short term profit is the driving force — yes, we will have these posts from users about webdav, google drive, latency, and other, completely avoidable, hurdles.
I’m surprised I have to elaborate on this.
But just look at the modern software landscape, good software is far and between. Why? Because people are content with “quite well”.
Yes, this is an excellent reason to eliminate this as an option for everybodyelse /s. You know that yours is not the only use case out there, right? By the way, one of my datasets is about 20TB, how much have you tested with?
I don’t even know if I can add anything to that. If price is not a factor, literally every single thing in the world would be different. But price is a factor, in everything. At least for people in the real world.
I am advocating for having choices for users, and some of them might not be just like you (gasp!) and have different use cases and considerations as to what is important or not for them.
Under 3TB. Above two it was started getting unbearable. Prune would take weeks and restore version enumeration tens of minutes. Please don’t tell me I had to have fewer versions: deleting data to make slow software happy is not a solution. On the contrary, it is an illustration that this backend does not work.
The problem here is that duplicacy forcing users to hot storage makes them do these compromises. Archival storage is cheap, and is best suited for backup.
And besides price there is “value”. Optimizing price is foolish: lowest cost solutions are notorious at offering poor value.
That’s where we disagree the most. I think users shall have no say in inner workings of the solution. Even if having choices is great on paper, supporting all those choices is often impossible. If I know nothing about backup, and I pay money for a piece of software, i expect it to guide me to the “correct” solution. I don’t want it to force me to neither make mundane choices nor those in the domains I’m not an expert on, nor “learn on my own mistakes”.
Backup software developer is in much better position to decide which storage providers are suitable to provide the designed user experience.
Ideally, there shall be just one backend, (one of the big three), with all the optimizations made to take full advantage of that one backend specifics. Supporting 17 hundreds backends dilutes the quality of each, and increases support non-linearly — crappy backends will generate more support volume.
The software design is already fixed. Some storage provides will objectively work better than others. Hence, there can be one provider chosen that provides best experience. There is no room for user choice other than for the sake of having one. If only one provider was supported we would not be having these chats.
The only reason we wouldn’t be having these chats is that many current users (myself included) would be using some other software. I am done with this conversation.
Above I said that some users like yourself, who evidently prefer penny pinching to quality, may end up leaving for the “competitors”. And that’s ok.
You are vastly overestimating number of such users. Had this been my software — I would happily fire all my cost-{preoccupied,driven} users. These users generate least profit, but require tons of support (both directly and indirectly, in the form of development and testing spent on workarounds for bad backends). Been there, done that.
Regardless of the nature of the product, with the fixed amount of available resources you can either make a quality product that is not cheap, or half-hearted half-baked contraption riddled with “user choices”, at bottom of the barrel prices. Not both. You are advocating for the latter. I’m for the former.
I’m still not sure why you think that shifting the burden of making critical domain-specific choices from the vendor to the customer is an acceptable, let alone desirable, thing.
I can imagine SFTP outperforming WebDAV but what’s the concern with stability/reliability?
I’ve run 2 WebDAV storages for about 2 years (with check -chunks as part of the schedule) and the only issue was one corrupted chunk which I suspect was caused by a power outage.
WebDAV protocol design goals (document exchange) are drastically different from what software like duplicacy requires (fast access to a massive arrays of small files), and as a result, there are bottlenecks in “weird” places, slowing down access in the best case, or losing data in the extreme, where e.g. server runs out of ram, drops connection, or writes corrupted data due to some bugs: there is the whole web server driving this under the hood, as opposed to purpose-build engine like sftp.
Regardless of whether or not you think users should be given “choice” by any given software developer, this statement alone demonstrates why they absolutely should be given choices at all costs. You (in particular) are not qualified, nor have the experience of others, to judge what is a “wrong” solution.
Case in point - you claim GCD is inadequate for Duplicacy. My vast and flawless experience with it says you’re 100% wrong.
How do we resolve this conflict?
Well how about offer users some bloody “choice” and stop treating everyone like children?
Let’s be honest, the only reason you keep dissing on Google as a provider is you quite clearly have a bias against almost everything they do. They compete with Apple and it pisses you off their customers aren’t as bled dry as you are.
Compromises, such as not verifying backups due of exorbitant egress costs, perhaps?
Advising people to switch to sftp from WebDAV is one thing. (I wonder, how you would have come to your conclusion, had you not the opportunity to test it for yourself.)
Advocating for the removal of all but the protocols and features you personally approve of, is the height of arrogance. (I wonder, how you’d feel if I said ‘archival tier’ storage should never, ever, get added to Duplicacy. And yet, even though I would never use it, I’d be very happy if it did get added.)
And yet despite all this, somehow, this is a wrong solution. Shame on gchen for steering you down this forbidden path
I can only speak from my experience on what I think is a relatively small backup set, around 3TB, but read/write has been reasonable and restores responsive. This is check -chunks on an ARM Odroid HC2 w/2GB RAM running lighttpd+mod-webdav. (Moving average so some smoothing.)
Software developer is. Any choice and option passed to the user is the one developer did not/could not/would not make. Shall we let users pick their encryption codec and scheme? Chunk sizing? Compression codec?datastore layout? Snapshot format? What else? Have them write their own backup software? Target backend is just another part of the “backup solution”, its choice must be must be made by the vendor, based on intricate knowledge and profiling of the engine design.
Easy. Another positive experience means nothing. One negative — disqualifies the whole thing. The solution shall work for everyone, not just you.
Not at all. I’m against using GoogleDrive (OneDrive, DropBox) etc as S3 replacements, in the context of this discussion, yes.
I’m also saying that all Google’s free user-facing services (photos, mail, docs) are parasites sucking up your data to benefit their ad services revenue, and you should not use them.
I have nothing against Google Workspace, GCS, Domains, and multiple other excellent services and products that came out of google. Including go.
Let’s not lump everything into the same bucket.
Are you saying downloading entire backup history periodically is normal? Why? You don’t trust your storage provider? Why do you keep using it?
And you can totally download 100Gb/month from Glacier absolutely free, if you really need to. But I fail to see the reason why. Do you also second-guess your ram and CPU correctness? I trust AWS and google. I don’t trust duplicacy with prune, hence, check (with no arguments) is sufficient. Egressing data is misguided.
On one had — you are expecting every user to do that testing for themselves? It’s waste of everyone’s time. On the other — you don’t need to taste spooled food to know it’s toxic.
You ca deduce from the design goals whether it will be a good match. The fact that it works short term is irrelevant and inconclusive
That’s because you were given that choice and you are resisting to let it go. There are a lot more design choices that were made without consulting you, and you are perfectly content with accepting them as is.
Suitable storage backend should have been just the same silent choice made as part of design.
Correct. It’s wrong solution. It can be wrong and still work for a while at the same time, there is no contradiction.
While walking back I thought of another analogy that might help understand why egressing from the data provider for “checking” is unsubstantiated and I’ll-advised.
Imagine you backup to a local nas. NAS is a ZFS array with monthly scrubbing. The act of scrubbing the checksumming redundant storage guarantees data correctness right afte the scrub. Would you also run duplicacy check-chunks on top?
If yes, please explain why?
If no - then you should not egress from commercial cloud storage either for exact same reason. You pay them to keep your data consistent.
As I see it, broadly there are 2 opportunities for error: transport and storage.
Even with local storage transport is more complex, so there’s a greater risk of error, which check will detect. Once transported you rely on the storage (ZFS in your example) to preserve integrity.
No system’s perfect, so you mitigate risk. Where it’s highest and the “cost” to address it, which determine priorities, will be system-specific. Best practices but no universal right answer.
Getting back to the OP, were you able to connect after manually editing the settings file?
Right, every part of the process needs to provide guarantees. Perhaps unencrypted FTP is a bad idea, but SFTP, or S3 provide those guarantees: transmission either succeeds, and the data is intact, or fails.
Very well said. The probability of Amazon AWS (or Google Cloud, or insert your favorite hyperscaler) losing data either due to flawed transport or storage is non-zero. But it’s small enough, to make “full egress always because you can’t trust them” even remotely a proportional response.
OneDrive had a bug few years back where they would happily save truncated files. That is another reason to avoid extra complexity level on top of bare storage.
B2 had a bug few years back where API would return bad data. This is another reason to avoid small vendors like B2, iDisk, pCloud, etc., who compete on price only.
In other words, as you said, you want to optimize cost and risk.
One way is to use shoddy cheap storage and egress every day, hoping that data will survive until the next check. Other way is to use storage that is inherently more reliable, if nothing else just due to sheer number of customers and data stored, and associated test coverage; and rely on a combination of guarantees afforded by the protocols, and their competent implementation, not hopes. And the incentives’ alignment just adds polish and performance.
From the very beginning, Duplicacy has ticked most of the boxes for most users - a choice between local and cloud storage, or combination of both - and succeeds, precisely because it offers users flexibility. This forum, for example, wouldn’t exist otherwise. Numerous features have already been implemented because users asked for them. Duplicacy is a better product.
By your twisted logic, support for archival storage should never happen.
This is arrogant and delusional. Your failure to get GCD working, while plenty of others are able, is no reason to strip the feature you no longer use, and others do.
Same goes with WebDAV. Perfectly suitable protocol for use over wireguard. You want the developer to yank it due to your bad personal experience, for what… to ‘protect’ users from themselves? LOL
What’s ‘free’ got to do with anything? It’s not free if we pay a bloody subscription!
Yes, it is normal - as part of a proper backup strategy.
You test it because nobody should ever simply ‘trust’ storage of any kind. Regardless of tool, regardless of storage type or reputation. You ‘trust but verify’, and employ well-proven strategies, such as 3-2-1, to mitigate against bad assumptions.
Yes. Not just -chunk, but occasional full restores too.
The storage medium isn’t the only point of failure. Software bugs are another, with ‘non-zero’ chance. If your policy is to only ‘trust’ but never verify the software, either, then more fool you.
These are old arguments and I’m not here to convince you. Everyone else here hopefully understands 3-2-1, and verifying backups, is best practice despite @saspus saying the opposite.
Thank you guys for supporting me about this topic.
Not good at English so I decided not reply after saspus told its not recommended several times and ask for more details days ago. Apparently he can not help me solve the problem.
Using sftp means I need to enable ssh and share root password/key in other place or create a new user and configure permission and so on. It’s not safe or convenient for newbie like me.
With dufs I can serve webdav in one command and without worry about other thing like access control, just pure file sharing.
I do know I can change the json file url to webdav-http to achieve this, I see this solution in another topic.
My problem is that I need have a webdav with ssl which share the same password since it’s encrypted ,to generate a ‘dummy config’, then change its url in config file.
Unfortunately I can’t add a this ‘dummy config’ in web edition without a working webdav with ssl. When adding, it need to connect and add some file.
Have to enable ssl for webdav temporarily and disable it in the end. I don’t think it’s a ‘solution’.
First you init (or add) the storage from the command line with http
Then you manually add/edit the storage block in the json file.
None of this is done with the web interface.
Duplicacy supports http WebDAV but (for some reason) the web interface enforces an unnecessary https requirement. The goal is to set it up with the command line and manual editing, bypassing the web interface.
Your storage is encrypted? Mine isn’t, maybe that’s the difference.
I believe (someone correct me if I’m wrong) that Duplicacy can be used entirely from the command line. If so, encryption shouldn’t prevent you from following the steps in my post above.