Multi-tiered backup, duplicacy -copy, Comcast monthly bandwidth limits

cheitzig · 2 January 2019 01:37

Hi,

I’m a new Duplicacy user. Most of my (family’s) computers will just use the GUI and backup c:\users\Username and be fine with cloud-only backup. But my main home desktop has probably 1TB of disk space, and I have a set of two rotating USB drives as well as access to an unlimited Google Drive account for cloud backup.

The requirements I’d like to fulfill are two-fold:

First, I’d like to rotate the two USB 1TB drives offsite every month or so. I think this will work fine-- every (say) first of the month, the backup software will realize the storage is missing a bunch of files, and “catch-up”. Since there is no local db, duplicacy will just see that it needs to backup a bunch of files and do so.

Second, I’d like to also backup this computer to the cloud. This seems do-able with the duplicacy -copy command which keeps things in-sync.

I see two issues though:

Is the -copy command re-startable? Comcast imposes a months 1tb upload/download limit, and my data alone on the home desktop is probably 1tb or so. So my initial sync-to-the-cloud will need to be over a couple of months.
If I want to keep the copies in-sync, the only way I can think of to do that is when I rotate the drives to do a duplicacy -copy from driveA to driveB (where driveB is the drive that was until just now, off-site). Otherwise, I won’t be able to use duplicacy -copy from driveX to cloud because the cloud might most recently have been synced with the “other” drive.

The other way I could think about it (and the way I do it in CrashPlan) is to just have three destinations: driveA, driveB, and cloud. CrashPlan fails when one of the destinations is missing, and there’s always either driveA or driveB missing, so that works. As I write this, I’m sort of leaning towards this option. It has the disadvantage though, that if I ever want to do a complete restore, my three storages aren’t in-sync, which limits options a bit (i.e., if there’s a fire and my local drive is destroyed, my remote drive would still have “most” of the files, and I could reverse copy from cloud to drive and then restore from drive).

Another thought is that I’m just being paranoid rotating offsite backup drives, and that I shouldn’t bother and just use cloud as the backup to the local USB drive.

Anyway, I’m interested in people’s thoughts and/or how others have handled.

TheBestPessimist · 2 January 2019 08:08

I am only backing up my data (2 PC + my server) to a Gsuite Business Drive account (so unlimited storage, just like in your case). You can say i’m keeping all my eggs in a single basket.

About the copy command: i expect it to be somewhat easily resumable: if you cancel a copy, the chunks will remain on the storage so next time you restart the copy it will see that the chunks are there and will skip over them. This means of course that if at some point you uploaded 900GB and then cancelled, then you’ll have to check and skip all the 900GB of chunks and only afterwards upload the remaining 100GB.

This can be alleviated a little by using multiple threads for the copy – you have to check how many threads are optimal (google has A LOT of rate limiting (429 errors) which aren’t explained, just silently applied). In my case about 20 threads is the most i can do w/o google kiling me really really hard.

The other option for the initial copy is to use the google drive file stream (since you’re a business user) and let that upload at its leisure. But in that case you MUST pay attention not to fill up the drive which holds the GDFS file cache. Basically you init a new storage as a local drive instead of google drive storage.

About the whole swapping 1TB storage disks: is 1tb enough? i mean your data is 1tb so what are you going to do with all the future revisions when you run out of disk space? Or if your repository grows past 1TB?
I think you need bigger storage drives.

I also think this is a good solution. Cumbersome, but i don’t know how else to handle this rotation <-> sync.

Droolio · 2 January 2019 12:51

One thing to keep in mind if you’re rotating drives…

Ideally, you need to keep them in sync - i.e. bring them together and copy between them - the snapshot revisions, in the proper direction - before running any backup. So that when you run the next backup, the revision number created is the highest between the two. If you don’t do this properly, you’ll end up with two different revision 50s, say, and unable to sync them up.

Instead of considering them as rotating, think of them as one on-site and one for off-site copy. The idea is to keep them in sync always. If you want to rotate them to balance physical wear, you can do so but only after sync’ing them fully.

OR, you can set the two drives up with different repository IDs. That would allow you to rotate them without sync’ing often, or at all. But you’d have a messy revision history.

BTW I can attest that copy can be resumed quite easily. The only slightly time-consuming part, or extra overhead, is when it first has to list the chunks on the storage before it resumes the copying. This can take a while if you have a lot of chunks, but it shouldn’t be an issue if you’re only doing it once in a while.

towerbr · 2 January 2019 12:53

I have about 1TB of data, which I back up to B2 and to my home NAS (2TB). I have a second 2TB disk stored off site, and every 10 or 15 days I swap the NAS disk (it allows quick swaps).

Previously, I considered NAS as my primary backup, and B2 was my “second off site copy”. Today I consider the opposite: B2 is my main backup (updated daily) and NAS is a “local copy”, which I will use if my HDD dies, so I don’t have to download hundreds of GB.

The storages (2 rotating disks in NAS and B2) were created with add -copy -bit-identical, that is, they are compatible. But I don’t care that they are synchronized. The revisions will be slightly different between them, but - in my case - this is no problem.

About your questions:

Yes, you can interrupt and resume as many times as you want. I did this in my initial backups.

Another strategy is to use filters to add content (folders?) gradually, so your first cloud backups will be smaller / incomplete but the revisions will be finished. I also did this at the beginning.

I also don’t visualize another form, and also think it is cumbersome. I think backup should be something “automatic” and not depend on the iteration of the user, and if it is bothersome, it will not be done.

cheitzig · 2 January 2019 14:18

Wow. Thanks for all of the thoughts! Lots to digest, but that’s awesome!

saspus · 2 January 2019 20:20

Slightly tangential, but this may avoid everything discussed here

Comcast does not enforce 1 TB limit. Instead, if you happen to exceed it over a number a months (two, to be precise) they will send you a warning, and if you keep exceeding only then will they charge you overage. Since the bulk of your data will be uploaded in a single month you don’t have to worry about extra fees.

And quoting here:

…the first two months you exceed a terabyte you will not be charged for overages, no matter how much you use during those months. You will only be subject to overage charges if you use more than a terabyte for a third time in a 12-month period.

TheBestPessimist · 2 January 2019 20:54

Totally offtopic here but in ROmania (where i’m from) i can transfer 30TB/month for about 10USD and there was never any sort of problems.

US internet really really sucks…

cheitzig · 2 January 2019 20:58

Yeah. In particular, the 1TB limit is ridiculous! Amusingly, Comcast / Xfinity wants people to pay for higher speed bandwidth (e.g., up to 1gb service), but the 1tb limit doesn’t change by tier!

saspus · 2 January 2019 21:30

Yes, like any other service that requires physical distribution, internet would be more expensive in the US. My half-educated guess is that due to vast size of the country and relative low population density infrastructure cost is very high per person as opposed to most countries in EU. In my (small) home country (also in the south-eastern Europe) we can get unlimited 250MBps symmetric internet for $7 month or so. And that is actual 250Mbps, without that bullshit bullshit “up to” qualifier, fiber optic to the building and then cat6 ethernet into condos/flats/apartments/whatever it is called and no caps of any sort.

Not defending Comcast but I guess they offer what people are buying: nobody asks better latency but everybody wants more bandwidth. It looks awesome on paper, and it does not matter if it’s shared. Does not matter if buffer bloat makes its unusable due to abysmal QoS – you can always pay for MORE bandwidth if internet feels slow! (for the record, I had 12Mbps/2Mbps for a long time here, fully saturating line 24/7 (backup of course:) and did not feel any lag in any apps due to proper QoS on the WAN interface on a gateway, but that’s a story for another time)

I guess this soft 1TB limit is there to prevent abuse. It does allow legit use – like occasional spikes with your initial backup and possible full restore but on average for average consumer streaming HD videos in the evening that is more than enough.

And yet, they still have unlimited plans, with actual advertised bandwidth, with actually responsive and helpful tech support. It’s just they are under umbrella of “business plans” (e.g. Teleworker plans), and are a bit more expensive; There are almost always discounts available which brings the cost to the consumer connection level but support and quality to the totally new level. I kept always reading about how Comcast customer support sucks and how evil is the company and yet I had absolutely awesome experience with them. And then it hit me - I’m using business grade internet… it all made sense.

So I guess it’s as always, what you pay is what you get

cheitzig · 4 January 2019 02:15

Regarding your strategy of the “cloud” being the master. I like that idea. It seems pretty simple. Once I get everything setup (either in the cloud or on a local disk), I can just sync everything with a -copy -bit-identical.

From then on, maybe I sync to the cloud every half-hour or whatever and have a once-per-day job that does a -copy -bit-identical from the cloud to my local disk. That way, the local disk is always an exact copy of the cloud and is in-sync. Once per month when I switch out the drives, there will be more activity, but generally, it won’t be a ton.

The biggest downside I can think of is that I double the amount of data I send over the wire every month (i.e,. I upload it and download it), plus some overhead related to checking which data is already there, but that potentially seems manageable.

Thoughts?

towerbr · 4 January 2019 22:52

@cheitzig, I really don’t see any important reason to justify the synchronization between storages. After all, it’s a backup, not a sync (like Google Drive, Dropbox, and others).

There are basically two main use cases:

A file or folder gets lost (corrupted, ransomware, improper modification) and you want to recover a recent version;
(Remember that in this case, if your files are synchronized in Google Drive, Dropbox, etc, the versions of the last 30 days are saved.)
You lost all your files (stolen notebook, crashed HDD) and you need to recover the full backup.

Of course, if the storages are not synchronized, it may take a brief time (2 minutes?) to identify which of your storages has the latest version. Other than that, is there any other disadvantage that justifies the resources (time, download costs, etc.) to keep the storages synchronized?

Can someone tell me if I’m missing something?

Droolio · 5 January 2019 00:36

I don’t think ‘synchronisation’ is the right word, or concept, in wanting to maintain two storages in that way. It’s not really about synchronising storage, but making sure you can copy between them which I think brings the biggest advantage to that system.

You can, of course, do it any way you want but I personally think it gets a bit messy if go down a route that results in two storages being effectively incompatible to copy between, due to two sets of revisions.

In my case, my local storage is on 24/7 (NAS), so it makes sense in terms of efficiency to backup to local storage then copy that to the cloud. No need to download anything from the cloud unless my local copy gets destroyed or corrupted. And if my local copy gets partially corrupted, there’s still a very good chance I can use what’s left and repair with the cloud copy, saving bandwidth and ~~increasing~~ decreasing restore times. Hence making sure I can copy between the two - in either direction - if needed.

If your local storage isn’t on 24/7 (say, an external HDD) but you want automated backups, going straight to cloud makes a lot of sense. But if your bandwidth isn’t that good in either direction, and your local copy isn’t up-to-date, doing a full restore will be cumbersome and a two-step process. Restore from local then restore from cloud over the top, would be the fastest way. That really isn’t tooo much of a problem, but what if you just want to repair a damaged local storage (maybe you pruned too much)? Or visa versa.

Also, regarding -bit-identical…

I see it mentioned a lot when talking about -copy-compatible storage. Remember, you don’t have to use that flag unless you’re using alternate ways to copy chunks (rsync).

By rights, there should be a very slight security advantage to not using it, because an attacker wouldn’t be able to compromise both storages with knowledge of just one of the storage passwords. This is mainly of concern if the two storages aren’t merely copies of each other, but rather if one is a super-store for multiple systems, an attacker could gain access to more data than otherwise.

towerbr · 6 January 2019 13:15

Good points, but I think in the end my solution still seems appropriate - for my use case.

Very well said, “copy compatibility” is really a more appropriate way of approaching the subject.

Yes, if your local storage is on 24/7, it’s obviously faster and it makes more sense to back up to it first. But it’s not my case.

That is my case, my NAS isn’t on 24/7, and i travel a lot for work, so “cloud” is more accessible.

Yes, that’s exactly what I would do if I needed a full restore.

Yes, this would be the use. If I need to copy chunks between the storages using rsync or rclone.

I agree, but in my case, I use an offline password manager. If my password is compromised, I think I would have bigger problems than my backup. Anyway my NAS is not exposed to the internet, it’s only on LAN, so it’s a secure “copy”. But I agree that for a regular user, who doesn’t use password managers, and repeats the same password at multiple services, this would be a problem. It’s a trade-off: copy compatibility (rsync) x security.

In the end, the summary seems to be:

If you want to be able to copy chunks between the storages, create them with -bit-identical;
If you want to be able to make larger repairs on one storage from the other (copy snapshots, revisions), keep them in sync.

The first point has zero cost of resources and does not demand any different task on a day to day basis, but may have a slight security problem.

The cost and operational difficulty of the second point will depend on how the user’s storages are configured (and where they are).

cheitzig · 11 January 2019 15:23

Lots to think about. Thanks for all of the replies.