Support for estuary.tech as backend (IPFS/Filecoin)

cyrond · 6 May 2022 18:30

Estuary is a pretty new service which builds on Filecoin and IPFS. It does automatically handle the transition between both networks: IPFS for hot storage and Filecoin for cold storage. Data which is uploaded is replicated in 6 Filecoin locations and then slowly discarded from IPFS.

The data can be easily retrieved via IPFS - which also offers the usual benefits of local caching and fast transfer between computers in the vicinity or the local network.

This overall sounds like a pretty cool backend for Duplicacy, as Filecoin is a cheap way to store large amounts of data.

Only caveat: As the service is currently free and invite only it’s “temporarily limited to users wanting to store meaningful public data”.

Still, if it goes out of alpha this would be a great way to store backup data cheaply and efficiently

Can this maybe added to the wishlist for new storage backends?

gchen · 9 May 2022 16:59

I might be wrong but I’m skeptical of any p2p storage service like this. The reliability is the number one concern: is there a guarantee that your files always exist when you need them?

At least I would wait until the service becomes proven.

cyrond · 10 May 2022 00:42

Hey gchen,

well, the service itself is just a “pretty frontend” to Filecoin. Filecoin itself is proven technology: The person who is storing your data needs to make validation calculations on request, which he cannot if he doesn’t store the data.

So in the end, he wouldn’t get paid. While this might happen, it’s not likely. Since it’s still bad if that happens, this service will do 6 copies of your data to make extra sure it survives.

The main advantage between using “plain filecoin” and using this service is, that they do the hard work for you, choosing contract partners within filecoin, shipping the data to them etc. Filecoin also has a retrieval process to get the data back, which does this service for you as well.

The access to the data itself will be over IPFS, which is also pretty proven technology. I mean, Netflix uses it between their data-centers to ship the content with virtual server images around.

saspus · 10 May 2022 01:37

So it appears it’s a “best effort” type of deal, or do they provide some numerical uptime and availability guarantees? Because six copies on random users computers (definitely located on the cheapest rotting media without checksumming) does not inspire confidence.

Think about it this way — how would Amazon s3 that works on scale and buys hard drives on scale not be able to provide the same quality of service much cheaper compared to a random user with a contraption in the closet? And you need 6x that. The only way how is by using rotting media. Those checks will succeed but when you need to restore — opposite, sorry.

This maybe ok for some temporary scratch storage but not a long term data archival — which backup essentially is — with higher durability requirements than users home equipment. And my home equipment is zfs array with ECc ram. And I were to offer storage from this my one-off appliance — nobody in their right mind would pay the exorbitant cost I would need to charge to even break even. It would be ridiculously expensive. There were multiple services like these over time, born and deceased.

The fact that Netflix found the service fitting their needs is more of an argument against than for: for Netflix data integrity and availability means nothing. Completely unimportant. Corrupted file? Ah, there will be small glitch in users stream. Missing array entirely — worry not, will fetch from source again. Thousands of sources not counting master, (held at S3, with SLA).
Totally opposite from my one in a kind cat picture.

cyrond · 10 May 2022 12:55

Hey saspus,

uptime need to be close to 100% for them to receive the payout for the contract. So any long time disconnection or leaving of the network does void the contract basically.

So the user only pays if the contract is fulfilled, which means that they store the data and have it accessible on demand.

Checksumming is done in IPFS as well as Filecoin itself:

IPFS is actually addressing the data via the hash of all the chunks of the data. So it’s guaranteed to be the same file you requested when it comes out of the IPFS daemon.
Filecoin doesn’t tolerate bit-rotting either, as the data is altered as thus the deal is not fulfilled.

IPFS is more about distributing “hot” data, while Filecoin is more about archiving the data cheaply on “cold” stores which are checked for consistency from time to time automatically, so you don’t have to. And you don’t pay people who loose your data.

Both protocols are made by the same community (which is backed by a startup which raised money to develop Filecoin). IPFS and Filecoin are also being evaluated for the Internet Archive (IIRC) as new backend infrastructure.

IPFS itself got a cluster application, which offers a way to make sure certain data keeps being available on the network, so kind of sharing storage requirements with strangers who like to contribute to the cause, or inside company/organization which needs to spread out a large amount of data on multiple servers.

There’s an (pretty new) API which can offer “pinning services” for cluster applications, so you “store” the data on your local IPFS node and send a request to the cluster to “pin” it on the cluster, which then requests the configured amount of cluster members to store it locally.

There are some “collaboration clusters” - where the project websites for example are stored. I run one of them (but are not affiliated with the project itself in any other way) which offers ArchLinux package mirror services via a cluster of servers, which is kind of neat, as computers which are close to each other via ping tries to fetch the data directly from each other. This way, updates can be received via LAN speed from other computers on the local network, if possible - while the rest is fetched from the cluster.

Here’s my project site:

A list of all collaborative clusters on IPFS: https://collab.ipfscluster.io/

The estuary.tech project offers on the other hand an API to store data in IPFS locally, and then request to have it archived on Filecoin. So the data is transported to their servers and then spread on 6 different filecoin nodes. The guys running estuary.tech do select a subset of Filecoin nodes to avoid selecting 6 ones from the same company which then “disappear” with your data etc.

As there’s quite some running costs involved in running a Filecoin storage, as the proofs for spacetime they use are pretty CPU/Memory intensive it’s extremely unlikely that storage deals in general are not fulfilled, as they need to pay for their operating cost - which they only can if they fulfill the deals.

So no, no “best effort” or closet computing involved here.

The price to store data on Filecoin, on the other hand, is pretty cheap. As there are a lot of players around the world offering their empty storage capacity to Filecoin deals:

You’re looking currently at $0.0000038 USD for storing one GB of data for a year on Filecoin. So 2 cent per TB per year if you want to store 6 copies. Source: https://file.app/

Apart from that, if a Filecoin deal fails, as data is no longer provided by one of the 6 Filecoin providers, estuary.tech will fetch the data from the other 5 peers and create a new deal for you.

Hope I could clear things up a bit