Minio vs Local Storage

benchmark

#1

Is there any advantage of using minio over local storage or local storage over minio?

I have server that I want to copy to external storage that also has minio.

There’s literally no difference to me since it’s all local. I was just wondering if one was better behind the scene for some reason or if one would run faster than the other. I’m not limited on cpu, memory, or other resources either besides maybe storage.


#2

If they’re both the same machine I think that using local storage should be more efficient simply because there’s no extra layers of transport and data processing that your backups have to pass (eg. all the processing that the minio stack has to do).


#3

Only advantage of Minio on localhost I can think of is bitrot detection - if you can spare more than one drive…
Duplicacy check is pretty slow, so another reliability layer can be useful.

But if you plan to use Duplicacy copy, then is it’s probably overkill.


#4

That’s hard to say without actually measuring performance. Minio may be working as a caching layer between duplicacy and the filesystem and improve performance.

I guess it all depends on what “local” means. Usb Drive? DAS? NAS? The bit-rot detection may also be helpful but some file systems do it on their own, eg BTRFS or ZFS. How much available memory is there to allocate to Minio caches vs filesystem cache? Storage performance for duplicacy workload vs Minio backend workload, Etc, etc.

All else being equal though, the fewer moving parts are there the better from reliability perspective. And after all, this is a backup tool. It shall be running at low priority in the background the slower the better, as to not rob resources from other tasks and therefore perofmace is irrelevant really. I would go with SFTP to bit-rot award server. This would be the most barebones solution maximally separating responsibilities.


#5

If you only have one form of storage, anything you can do to prevent it from failing will increase the probability of being able to successfully recover data from it. If Minio or a rot-correcting filesystem will do the job, use it. I’m running my local storage as files on an ext4 filesystem which can rot, but I have secondary cloud storage checks and corrects at least once every 90 days.

I debated using Minio because I have multiple machines (some offsite) and ultimately decided that SFTP would work just fine. As a bonus, there was one less hole poked in the firewall and one less service that needed securing.


#6

I love all of the well thought out answers. This backup is in addition to my backup to GCP. I was using Bvckup 2 to backup to a couple external HDDs to have a second backup in case anything ever went wrong with duplicacy/gcp/encryption codes. These ran out of space and I didn’t want to split it to yet a third drive.

At first I was just going to use duplicacy with some external hard drives and split the backup. Hoping the deduplication would hold me over for a while. Then I decided to resurrect an old NAS I have and RAID 0 a couple of drives together so I don’t have to split into multiple backup sets. Then I decided that since the NAS was too slow (half the speed of GCP or less comparing to the old initial GCP logs… hard to tell), so I bought an old 12 bay server, second raid card, some 8088 external connectors, etc all cheap on ebay and I’m going to setup a DAS.

If I add it directly, I agree that anything additional might be unnecessary overhead. I think I’m actually going to connect it to a virtual backup server on the same machine though and since there’s a lot of overhead on samba/ windows file shares, I will probably use mino (usually, if you ask for 1 file exists, it downloads the entire directory listing each time you check a file (this happens transparently, but it happens)).

I also like the suggestion that using mino will provide bitrot detection. Another reason to use it. Duplicacy check is dreadfully slow in my experience as well… understandably so, but still. I wouldn’t have even considered this without your replies so THANK YOU!!!

My last consideration is that I have most of my servers on and offsite, backup to my main server storage, which then gets backed up 2 more times. If I can skip my main storage, create a second encrypted bucket for mino, and replicate that to GCP directly from mino as the site claims, it might save me some storage space/ allow me to keep more redundant snapshots.
“In addition, you may configure Minio server to continuously mirror data between Minio and any Amazon S3 compatible server.”

Again, thank you all for your thoughtful responses.


#7

For what its worth, I’ve installed Minio on the same server that hosts my duplicacy backups over SFTP, which happens to be Intel Atom C3538 based machine with 16GB ECC memory. I have started to copy the storage from sftp to Minio, and that would have taken three days, according to how fast it was going, at 30MB/sec on average.

So I aborted and ran duplicacy benchmark on each storage instead, three time in a row each, recording results from the last run, and watching CPU utilization on the server.

Minio: Cpu utilization around 15% by minio

alexmbp:~ alex$ duplicacy benchmark --storage minio
Storage set to minio://us-east-1@tuchka.home.saspus.com:9000/duplicacy
Generating 244.14M byte random data in memory
Writing random data to local disk
Wrote 244.14M bytes in 0.41s: 600.77M/s
Reading the random data from local disk
Read 244.14M bytes in 0.04s: 5772.33M/s
Split 244.14M bytes into 50 chunks without compression/encryption in 1.52s: 160.82M/s
Split 244.14M bytes into 50 chunks with compression but without encryption in 2.04s: 119.65M/s
Split 244.14M bytes into 50 chunks with compression and encryption in 2.07s: 117.73M/s
Generating 64 chunks
Uploaded 256.00M bytes in 8.18s: 31.28M/s
Downloaded 256.00M bytes in 3.13s: 81.91M/s
Deleted 64 temporary files from the storage

SFTP: CPU utilization around 4% combined by two sshd proceses

alexmbp:~ alex$ duplicacy benchmark --storage tuchka
Storage set to sftp://alex@tuchka.home.saspus.com//Backups/duplicacy
Generating 244.14M byte random data in memory
Writing random data to local disk
Wrote 244.14M bytes in 0.45s: 544.40M/s
Reading the random data from local disk
Read 244.14M bytes in 0.05s: 4940.23M/s
Split 244.14M bytes into 51 chunks without compression/encryption in 1.51s: 161.56M/s
Split 244.14M bytes into 51 chunks with compression but without encryption in 1.97s: 124.07M/s
Split 244.14M bytes into 51 chunks with compression and encryption in 2.08s: 117.57M/s
Generating 64 chunks
Uploaded 256.00M bytes in 2.95s: 86.77M/s
Downloaded 256.00M bytes in 3.35s: 76.31M/s
Deleted 64 temporary files from the storage

Why writes to Minio is 2.5 times slower I’m not sure. Perhaps there are some tweaking to be done, but for this specific use case SFTP seems to be superior, and the caching and optimization that Minio could have provided did not materialized with the default configuration. So I nuked the whole thing and will continue to use SFTP.


#8

Thank you for your comparison results. I will try this as well when I get mine setup. I won’t be able to work on it for another week or so. I will post my results once I have them. I’ll see if I can try it with and without different amount of upload threads too (usually 8) to see if that makes a difference.


#9

Is there another beneficts from using S3/Minio over SFTP ?

As stated here Backup using ARQ on Minio, it seems there is :slight_smile:


#10

over SFTP like atomic writes of files (faster and less error checking required by Arq),

Checking file size is not that hard and transferring files over SFTP reliably has been polished to death.

checksums of uploaded data (so Arq can verify the NAS received the correct data),

This is not the job of a backup tool. This needs to be ensured by transport — SFTP in this case. Data is encryptred during transfer; corrupted data will fail to decrypt and will get retransmitted.

and much faster validation of data (comparing checksums instead of downloading data to compare).

Same. If the chunk is uploaded it must be assumed to stay the same. It is not a job of a backup solution to validat it. Host filesystem must protect it from bit rot. All validation should do is verify that the chunks required to restore files are present.

And yet, adding minio adds another layer of complexity that can fail. I would trust SFTP that existed for decades much more, and that not mentioning performance impact.

(Disclaimer: I may be biased because I wholeheartedly despise Arq based on 4 month of dealing with it, their support, and their idiotic articles)