Can anyone tell me why my backup keeps failing?

gchen · 20 April 2021 03:38

Your screenshot showed one backup failed while the other one was running. Is the other one still running? If so then it could be that Google is rate limiting your API calls when there is one running backup. I would wait until that backup finishes before starting a new one.

nathansmart · 20 April 2021 14:08

Like I said in that same reply, no. I am only running one at a time. In the past, I have run more than one at a time and Google connects just fine - but only on the drives that work. I have three specific drives that fail no matter how I run them. I’m just trying to figure out why those three drives fail every time and the other five that I have never fail.

The logs that I have uploaded here are all run alone by themselves with nothing else running.

gchen · 20 April 2021 18:55

The error occurs during the chunk listing phase. Only an initial backup needs to list existing chunks on the storage. There could be two reasons why other backups are working: either they don’t have as many chunks in the storage, or they are not initial backups (that is, backups with a revision number greater than 1).

I don’t know why Google servers always disconnected the connections if you’re not running other backups at the same time. Can you check the memory usage? High memory usage could significantly slow down the api calls which may cause Google to give up too quickly.

nathansmart · 21 April 2021 13:24

Does that mean that there’s nothing I can do since I can’t change the number of chunks it’s trying to list? (the other drives that work are much bigger - does that make a difference in the number of chunks it tries to list? can I move stuff around to different drives to fix that?)
How do I check memory usage?

saspus · 21 April 2021 16:09

Do I understand correctly that any initial backup of a new repository into existing storage will need to fetch list of all chunks? This can be ridiculous number of files and unless listing can be done in batches (can it?) it would be prone to failure, effectively meaning you cannot add another repo to a huge datastore.

Why is listing chunks needed in the first place? Why initial backup is treated any differenly than subsequent ones? It seems if it proceeded just like a new backup into an empty datastore it would just work — duplicate chunks won’t get reuploaded.

Am I missing something?

gchen · 23 April 2021 04:28

On macOS you can run top in Terminal to find out the memory usage. The thing to look for is that the amount of memory used by the Duplicacy CLI executable should be less than the amount of physical memory.

gchen · 23 April 2021 04:34

Listing chunks is needed so an initial backup doesn’t need to check the existence of a chunk before uploading it. A subsequent one assumes that all chunks referenced by the previous backup always exist.

We could have made Duplicacy continue to backup even after a listing error but in this case it won’t help much – it took more than 8 hours to list the chunks directory and fail which means there must be something wrong.

saspus · 23 April 2021 07:44

Would not it be better to check each chunk for existence before uploading one? This will spread out the workload and hide this initial latency entirely.

In the second log failure happened after 30 seconds, not 8 hours.

nathansmart · 5 May 2021 04:42

So, I guess I just can’t backup these specific drives?

gchen · 7 May 2021 03:46

Did you figure out what the memory usage was during the chunk listing phase?

nathansmart · 7 May 2021 13:36

How do I do that? Can you give me a little tutorial about it? I see that I can run “top” in Terminal but I don’t know how to do that. I know how to open Terminal and run commands that I copy and paste from online but what do I type to find out memory usage for Duplicacy? Do I run a backup for that drive and then open Terminal and type “top”? and if it’s happening in 8 hours (or in 30 seconds as saspus points out) when would I run it and find the exact moment that memory usage is too much? Or does it just list out all of the memory usage while it’s running?

nathansmart · 7 May 2021 13:43

okay, I ran it and I see a bunch of programs listed - it looks like the information in Activity Monitor on the Mac - is that the same thing? I can’t really read what I’m seeing - lots of stuff is happening. Here is what it looks like in Activity Monitor under CPU:

And here is what it looks like under Memory:

Screen Shot 2021-05-07 at 9.42.55 AM
I currently have it backing up two of the drives that work.

For reference, I have 32 gb of memory.

nathansmart · 11 May 2021 18:58

I got a new hard drive and so I wanted to test out if I reformatted one of the drives that isn’t working if that would fix the problem. I moved everything over and put a couple of files in the old drive and ran it. It failed again and here is the log. Looks like all the same stuff. I just can’t figure out what is happening with these specific drives. Why is Google letting me backup stuff from some hard drives but not others?

log.txt (98.7 KB)

EDIT: I just tried to backup the newest drive I purchased and added today and it is also failing with the same results. Honestly, it seems like drives that I have already completed backups on are working and drives that haven’t been backed up before are failing. Seems like I just can’t add anymore drives to Duplicacy.

I am testing out Arq Backup right now on the non-working drives to see if they will upload using another software. I tried Duplicati but I’m going with Arq because I know that’s worked in the past.

gchen · 12 May 2021 04:09

It is not that this drive has anything different from other drives. This backup is failing because only for this backup Duplicacy has to list existing chunks in the storage. Duplicacy has to list existing chunks for this backup because this backup is an initial backup and for an initial backup Duplicacy by design will figure out what chunks the storage already has in order to avoid a lookup for every chunk to be uploaded. Duplicacy doesn’t need to do this for subsequent backups because it can use the last finished backup as the baseline.

I still suspect that this has something to do with rate limiting because of the volume you’re uploading. Google is known to impose a few limits on the API calls. Normally it would return 403 errors when the rate limit is exceeded, but maybe closing the connections unexpectedly is another undocumented form of rate limiting.

Can you add -d to your other backup jobs and check the logs to see if there are any closed connection messages? Another thing you can try is to stop all other backup jobs for one day and then start this backup job first.

nathansmart · 12 May 2021 16:17

I have -d running on all the backups. Here are some from backups that work (though I aborted them while I was testing stuff): log1.txt (236.8 KB) log2.txt (185.5 KB)

It does look like there are some errors on the new chunks (if I’m reading it correctly). But, those do complete eventually. I am also trying with Arq Backup and I’m getting SSL errors there (that’s what they are calling it). I am going to try waiting a day as you said. Maybe that will reset the limits. Unless you see something in the logs that we haven’t already seen a bunch.

nathansmart · 17 May 2021 15:15

So, I tried the drive with Arq Backup and it gave some of the same kinds of errors, but since Arq backs up files as it scans, it could upload some of the files and then I restarted the scan when it finished and it uploaded the rest.