Interrupted initial backup not resuming

aweber · 13 October 2021 14:40

I read through this thread: Resuming interrupted backup takes hours (incomplete snapshots should be saved)

But it’s a little old, it’s basically on windows and it doesn’t cover what I think I’m seeing…

CLI Linux x64 v2.7.2…
Large initial backup to a Google Drive (storage where I already have a number of other repositories successfully). At some point Google throws a 400 and duplicacy aborts, not entirely sure why, but maybe due to how long the backup is running. The Incomplete file/status is apparently saved, because attempting to resume DOES find that info. However, after a long time of “Listing all chunks”, I get a “Skipped 0 files from previous incomplete backup”?

That doesn’t make sense. How is it possibly attempting to resume if it can not skip the files it previously uploaded???

gchen · 14 October 2021 03:10

This can happen if the first to upload is a large file and it had not been completely uploaded when the backup was aborted. You can open .duplicacy/incomplete to see its content. It is just a json file.

aweber · 14 October 2021 12:48

No, according to the output from the backup command, it has uploaded maybe 100 files.

The incomplete file appears to list chunks that have been uploaded (assumption), but it doesn’t list what files in the local repository are complete (via the chunks). Is a restart of the backup causing duplicacy to re-scan the filesystem and re-calculate all the chunks for each file to determine whether they are already present in the remote storage? The “Listing all chunks” seems to take a very long time.

EDIT: Interesting. This time I had to restart, though the “Listing all chunks” took a long time (well over an hour), it appears that the backup resumed where it left off. Wonder why that wasn’t the case on my previous restart.

0846365cc25e66f3f819 · 19 January 2023 14:40

I have the same issue. I have a NAS with a HDD attached via USB, so I’m trying to back up my nas to the hdd. I’m connecting to NAS via ssh and running the backup. It shows “Packed XXXXXXXXXXXXX” for a few hundred files, then at some point my ssh drops (no idea why, it’s local network). when i go back and re-run duplicacy backup it shows “No previous backup found” and starts packing the same files again.

saspus · 19 January 2023 15:39

It’s resuming, only data not yet uploaded to the destination is sent to the destination.

Separately, backing up nas to a single usb hdd is highly pointless. The latter does not provide data integrity guarantees.

And lastly, to avoid relying on active sah session run backup in tmux.

0846365cc25e66f3f819 · 19 January 2023 15:53

If it’s resuming, why is it Packing the exact same files every time i run the backup?

If you have extra money, please send a second NAS for a backup. Not everyone can afford a “proper” setup.

aweber · 19 January 2023 15:59

Using an external HDD is not “pointless”. It’s an offline RAID 1. As the author already replied, it’s possibly the only affordable alternative available.

Yes, a periodic integrity check would be very beneficial (and I think duplicity can be scheduled to do that, actually).

Absolutely, the actual backup (and check) should be scheduled offline on the NAS using whatever scheduler is available.

saspus · 19 January 2023 16:19

Your files might have changed between backups. It slices and packs them again; if this results in the same chunks that have already been uploaded — upload is skipped. This, data is only uploaded once, I.e. backup is resumed.

Then why bother in the first place? You won’t be able to restore from hdd with a very high likelyhood. Backup to the cloud instead. Cheaper than the HDD and will actually work.

It’s RAID0, not 1. And that’s beside the point.

Cloud is cheaper, and has a benefit of not being pointless. This has been discussed many times, you can search.

Backup to a single hdd just creates a warm and fuzzy feeling that your data is protected. It’s isn’t, it’s less reliable than your nas, does not guarantee integrity, and will rot.

I don’t buy this “don’t have money” thing from someone who has a nas.

aweber · 19 January 2023 16:29

Sorry, I don’t mean to go off-topic, but how is it RAID0?

saspus · 19 January 2023 16:35

Raid1 is a mirror, information is replicated, so if one of the sectors rots, there is redundancy to recover the information from the other copy.

Single drive records information as is (leaving out single drive DUP btrfs profile and other crutches for now). So, if sector rots — you lose data. Depending on the filesystem and location of the rot you can lose entire filesystem. There is no way to recover.

Duplicacy erasure coding might help somewhat but it’s not a guarantee. Checksumming arrays, like btrfs and ZFS not merely “improve chances”, they guarantee data consistency. So do most cloud storage providers. And due to scale they do that cheaper than you can achieve at home.

Single usb drives are suitable as scratch storage or for data transfer, never backup or other scenarios where long term integrity is required.

0846365cc25e66f3f819 · 19 January 2023 16:39

sorry, i made a mistake of thinking this forum was to ask questions and get help, instead it’s a place to get mocked and waste time.
thanks for clarifying @saspus

saspus · 19 January 2023 16:42

In my very first reply to you, in the very first paragraph, I answered your question. Then I went a step further and tried to warn you about your future data loss.

And even further, I suggested how to solve your ssh-session-dying-interrupting-backup problem.

If you decided to take this as offense — I guess you did not want help, you wanted to vent or get your wrong approach validated.

aweber · 19 January 2023 17:19

Yes, if you copy your data from your NAS to another drive, the information is replicated. If you use something like simple rsync to do it, it’s truly a mirror; if you use duplicacy it’s a dedup’ed mirror with some additional features (not to trivialize those features). You described how RAID1 functions, but didn’t answer how this scenario would be (or be closer to) RAID0.

Further to that, if you use duplicacy to backup to an external HDD, and periodically use “check -chunks”, you should be warned if your backup is inconsistent. (Next steps from that point vary, but probably you have to replace the external disk and start over.)

I agree it’s not nearly “best practice” to use a single, external HDD for backup, but I wholeheartedly disagree that you should make assumptions about the poster’s financial situation.

Your implication that cloud storage would be as inexpensive as a single disk is very interesting to me…especially for anything over 1 or 2 TB. Buying an IronWolf 4TB (CMR) currently costs about $72, assuming a 5-year lifespan, how much would you pay for 4TB of cloud storage for that period of time?

sevimo · 19 January 2023 18:01

@saspus is very well known on these forums as having radical opinions that he tends to present as statements of fact. Don’t worry about it.

saspus · 19 January 2023 19:06

Yes, this is a replication. Replication is not a backup. It’s a small part of backup. From backup, you expect versioning: if your data gets corrupted you need to be able to go back in time and restore version from 3 years go. Let me know if I need to elaborate here.

And this is exactly why this is not an acceptable solution: starting over means losing version history. So, it’s not really a backup, it’s a replication with extra steps. You still need to have a backup. And since this (hosting backup on a single HDD) is still non-zero expense, that also provides false sense of security—it’s a damaging, expensive, and counterproductive thing to do.

NAS is a luxury: it’s an expensive way to marginally improve some aspect of data availability—speed of access. In all other respects, such as durability, reliability, and cost—it’s worse than commercial data center. So yes, if one decides to waste money on a self-hosted NAS at home (and I’m one of those people too, yes)—they can afford to implement the rest of the (expensive) obligations that come with self-hosting data.

That wasn’t an implication: that would have been unfair and unreasonable to compare cost of one ingredient (egg, HDD) with a final product (cake, data storage solution).

In other words, 4 TB single disk in your closet and 4 TB at a commercial datacenter are completely different 4 TB. To come somewhat close, you can replace this single 4 TB with a few smaller drives (to get singe disk fault tolerance without doubling the storage); add electricity cost, your time configuring and maintaining the solution (the biggest factor here), postage cost for occasional RMA of failing disks, upfront expense of equipment to host the array, and associated opportunity cost, and commercial durable redundant cloud storage cost pales in comparison.

For example, Amazon Glacier Deep Archive costs $1/TB to store the data (add cost of restore multiplied by probability of failure here), and is one of the recommended storage tiers for backup applications (Duplicacy does not support it, but that’s another topic, it should). There are other solutions that (through the magic of marketing and averaging) provide customers with unlimited capacity at fixed costs (Google Workspace, Box, DropBox – duplicacy supports those) thus making cost per TB arbitrarily small, the more you store. And those are durable solutions with SLA, not a single drive in the closet.

^^^^ This is also an opinion. Anything you disagree with can be labeled “just an opinion” and dismissed. I provide explanation and justification most of the time, unless the topic is beating the dead horse (like this one—backup to single drive has been ruled out as a viable, let alone cost-effective, backup solution pretty long ago. It’s a false economy to fixate on the cost of HDD, and I really don’t feel like repeating this conversation again. This topic can serve as a hint—go research this more. Or of OP is not receptive to criticism of their solution and want to learn on their own experience—that’s fine too. ). I do admit that my writing can be abrasive and non-sugarcoated, but all information is there, and interested parties can research for themselves. Treat my posts as calls for action, and don’t blindly believe anyone on the internet, including myself. Most people think the HDD they bought at Costco is a suitable, let alone appropriate storage solution for backups (after all, it does say so on the box). If my radical opinion get someone thinking, researching, reading about rot, backup, versioning, etc, and as a result not losing data in 4 years—my ramblings were not in vain.

aweber · 19 January 2023 19:39

I’m not really interested in belaboring this further, but I think you made a LOT of assumptions in your replies.

Nowhere did the poster specify what FS the NAS is running, nor what redundancy that has built-in. It could be equally susceptible to bit-rot, etc.

No one specified any retention schedules required (or version history).

You still haven’t clarified how an external disk used as described would be akin to RAID0. It would be valuable to a reader (now, or in the future) to know how much weight to give your “ramblings” when you throw around storage terminology, but don’t seem to have a grasp on industry-standard terms from 30+ years ago.

Again, “I’m out”, but I don’t think you were very helpful today.

saspus · 19 January 2023 19:43

Your single external disk is a stripe of one disk. One disk cannot be a mirror, because for mirror you need at least TWO. I don’t know how else to explain it. It’s silly to refer to a single drive as RAID of any kind to begin with, but I went along since you’ve suggested this analogy.

Backup implies version history. Otherwise it’s “sync and replication”. Google “sync vs backup”

It does not matter whatsoever. We’re discussing backup target, that OP was dangerously close to hosting on a rotting single drive. What is the source of the data is not a factor.

…and then you wrote this whole comment anyway, belaboring this further…

towerbr · 20 January 2023 13:51

Sometimes discussions here on the forum deviate a little to off-topic but they are always very good, always adding good technical insights. I particularly consider it one of the best forums I participate in.

Regarding communication, there is always format and content.

This @saspus answer cuts to the chase:

The first sentence answers the question (elaborating a little more in the following posts), the second sentence presents a good backup practice and the third gives an extra tip on using Linux.

Some may not like the straightforward format, I don’t particularly mind, and since it’s a technical forum, I even prefer it.

If you search his posts on the forum, you will see that they are always with a great technical level and he is always helpful when asked for more details.

Just my personal opinion…

Droolio · 20 January 2023 23:29

You can safely ignore those who openly admit to not practicing 3-2-1 backup strategies, who don’t verify their backups, and who don’t even use Duplicacy with their preferred, single, cloud backup provider.

External HDDs are perfectly fine for storing a backup copy.

(So long as you have other copies and have evaluated those risks, which I’m sure you have - without further evidence and looking through your sock drawer. Only you can decide what your “self-hosting obligations” are, what an “acceptable solution” is, and whether “backup to single drive has been ruled out as viable” by the backup knights of the round table, apparently. Jeez…)

I’m off to set up ZFS at the local earthquake shelter… that’ll sort it…