Backup failing for Google Cloud Storage

Maybe I should have just let it keep running at 12. 8 eventually did the same thing (slowed down and started using only about 20% of 1 core. It’s probably just processing a larger file or something when it does that and only 1 thread can access it at a time so it sort of waits. So maybe there is no problem there.

The -threads option only sets the number of uploading threads. There is only one chunking thread that splits files into chunks and then dispatches them to the uploading threads. Therefore, I doubt 8 or 12 uploading threads would make a big difference when all chunks have been uploaded before.

The number one reason to use a large chunk size is to reduce the overhead of uploading chunks. This is because Duplicacy performs 2 API calls for each new chunk, one for checking the existence and the other for actually uploading the chunk. So the larger the chunk, the more time will be spent on the uploading.

In your case you can use multiple threads to take advantage of your fast uplink so I don’t think you need to increase the chunk size. On the other hand, if you have many files smaller than the default chunk size that may change frequently, you may consider decreasing the chunk size instead. Duplicacy may pack multiple small files into a single chunk (think of this as if it were packing all files, large or small, into a huge tar file and then running the variable-size chunking algorithm on the tar file), so these small files will not see the benefit of deduplication between changes.

That makes sense. Thanks for the explanation. Without consistent data sizes and uses I guess I’m kind of in limbo. I’ll just stick with the default for now. If I’m able to achieve a good amount of deduplication, then it’ll be worth it. If not, I may switch it to a higher chunk size and re-upload after some testing while I still have some GCP trial credit. I may create 64 GB of data in one sitting and the faster that gets backed up the better. We’ll see.

It seems to be working now. It says it has been running since Friday and still has a day and a half left (Sunday now) with 8 threads on a Gigabit connection. Not the fastest backup I’ve tried, but very possibly the fastest I’ve tried at this deduplication level. We will see when it finishes.

2017-05-21 16:44:00.530 INFO UPLOAD_PROGRESS Skipped chunk 714171 size 16777216, 17.23MB/s 1 day 12:48:47 58.2%

Another thought occurs to me and perhaps you can provide some insight. Since we know the program is working now, how much of a speed increase do you think we would see if I turned off the detailed logging?

Even though I am using multiple threads, there has to be some point in the application where a context switch takes place to write to the log file (edit: to the screen which I am piping to a log file). There are a number of techniques that could be used to speed this process along but from my own experience, something like that would be implemented further along in the incremental development process.

Do you have any thoughts or experience on how this would affect backup speeds for the current iteration?

PS
Thank you for your excellent level of support and putting up with my constant curiosities about the inner workings and my abrasive opinions and questions. Obviously my interest in the program you have built goes beyond getting my data backed up. I can’t wait to get my hands on the source code.

2017-05-22 10:43:54.064 INFO UPLOAD_PROGRESS Uploaded chunk 846122 size 2431489, 15.30MB/s 1 day 05:58:09 69.8%
2017-05-22 10:43:54.555 INFO UPLOAD_PROGRESS Uploaded chunk 846123 size 4517584, 15.30MB/s 1 day 05:58:09 69.8%
2017-05-22 10:43:55.305 INFO UPLOAD_PROGRESS Uploaded chunk 846124 size 5912045, 15.30MB/s 1 day 05:58:08 69.8%
2017-05-22 10:43:55.892 INFO UPLOAD_PROGRESS Uploaded chunk 846125 size 2124090, 15.30MB/s 1 day 05:58:08 69.8%
2017-05-22 10:43:56.322 INFO UPLOAD_PROGRESS Uploaded chunk 846126 size 1430229, 15.30MB/s 1 day 05:58:09 69.8%
2017-05-22 10:43:56.764 INFO UPLOAD_PROGRESS Uploaded chunk 846127 size 5855485, 15.30MB/s 1 day 05:58:08 69.8%
2017-05-22 10:43:57.201 INFO UPLOAD_PROGRESS Uploaded chunk 846128 size 1945520, 15.30MB/s 1 day 05:58:08 69.8%
2017-05-22 10:43:57.678 INFO UPLOAD_PROGRESS Uploaded chunk 846129 size 4581043, 15.30MB/s 1 day 05:58:08 69.8%
2017-05-22 10:43:58.094 INFO UPLOAD_PROGRESS Uploaded chunk 846130 size 5970921, 15.30MB/s 1 day 05:58:08 69.8%
2017-05-22 10:43:58.406 INFO UPLOAD_PROGRESS Uploaded chunk 846131 size 2172488, 15.30MB/s 1 day 05:58:08 69.8%
2017-05-22 10:43:58.806 INFO UPLOAD_PROGRESS Uploaded chunk 846132 size 2962683, 15.30MB/s 1 day 05:58:07 69.8%
2017-05-22 10:43:59.287 INFO UPLOAD_PROGRESS Uploaded chunk 846133 size 4371412, 15.30MB/s 1 day 05:58:07 69.8%
2017-05-22 10:43:59.678 INFO UPLOAD_PROGRESS Uploaded chunk 846134 size 4819426, 15.30MB/s 1 day 05:58:07 69.8%
2017-05-22 10:44:00.157 INFO UPLOAD_PROGRESS Uploaded chunk 846135 size 5437442, 15.30MB/s 1 day 05:58:07 69.8%
2017-05-22 10:44:01.048 INFO UPLOAD_PROGRESS Uploaded chunk 846136 size 13229446, 15.30MB/s 1 day 05:58:06 69.8%
2017-05-22 10:44:01.449 INFO UPLOAD_PROGRESS Uploaded chunk 846137 size 4054628, 15.30MB/s 1 day 05:58:06 69.8%
2017-05-22 10:44:01.900 INFO UPLOAD_PROGRESS Uploaded chunk 846138 size 1072854, 15.30MB/s 1 day 05:58:06 69.8%
2017-05-22 10:44:02.522 INFO UPLOAD_PROGRESS Uploaded chunk 846139 size 7562051, 15.30MB/s 1 day 05:58:05 69.8%
2017-05-22 10:44:02.994 INFO UPLOAD_PROGRESS Uploaded chunk 846140 size 2883132, 15.30MB/s 1 day 05:58:05 69.8%
2017-05-22 10:44:03.398 INFO UPLOAD_PROGRESS Uploaded chunk 846141 size 3689545, 15.30MB/s 1 day 05:58:05 69.8%
2017-05-22 10:44:03.753 INFO UPLOAD_PROGRESS Uploaded chunk 846142 size 1576206, 15.30MB/s 1 day 05:58:05 69.8%
2017-05-22 10:44:04.233 INFO UPLOAD_PROGRESS Uploaded chunk 846143 size 3350373, 15.30MB/s 1 day 05:58:05 69.8%
2017-05-22 10:44:04.584 INFO UPLOAD_PROGRESS Uploaded chunk 846144 size 2011109, 15.30MB/s 1 day 05:58:05 69.8%
2017-05-22 10:44:04.928 INFO UPLOAD_PROGRESS Uploaded chunk 846145 size 2492654, 15.30MB/s 1 day 05:58:05 69.8%
2017-05-22 10:44:05.260 INFO UPLOAD_PROGRESS Uploaded chunk 846146 size 1675592, 15.30MB/s 1 day 05:58:05 69.8%

I don’t think detailed logging would slow down the backup to the degree that it becomes noticeable, although there is no data to back this claim.

The upload speed is a bit slow for a 1 Gbps uplink. I’ll add performance profiling to my to do list.

The source code is now available from the github repository

Oh how exciting. I’ll have to wait to look at that until this weekend most likely.

FACEPALM But it’s so close…

2017-05-23 11:59:28.582 INFO UPLOAD_PROGRESS Uploaded chunk 1112591 size 10407000, 14.82MB/s 07:56:32 92.2%
2017-05-23 11:59:29.279 INFO UPLOAD_PROGRESS Uploaded chunk 1112592 size 5680803, 14.82MB/s 07:56:32 92.2%
2017-05-23 11:59:29.579 INFO UPLOAD_PROGRESS Uploaded chunk 1112593 size 3775219, 14.82MB/s 07:56:32 92.2%
2017-05-23 11:59:29.907 INFO UPLOAD_PROGRESS Uploaded chunk 1112594 size 3586406, 14.82MB/s 07:56:32 92.2%
2017-05-23 11:59:30.399 INFO UPLOAD_PROGRESS Uploaded chunk 1112595 size 10377805, 14.82MB/s 07:56:31 92.2%
2017-05-23 11:59:30.529 ERROR CHUNK_MAKER Failed to read 0 bytes: read E:\/.duplicacy\shadow\\/Shares/Access/VirtualMachines/MINTMINE/MINTMINE-Snapshot1.vmem: The device is not ready.
2017-05-23 11:59:30.669 INFO UPLOAD_PROGRESS Uploaded chunk 1112596 size 1966714, 14.82MB/s 07:56:31 92.2%
2017-05-23 12:01:01.273 INFO VSS_DELETE The shadow copy has been successfully deleted

There is probably no need to backup the .vmem file. I’m not sure if that was a VSS issue (i.e., the VM hypervisor didn’t implement the VSS writer correctly) or if that kind of file was not supposed to be readable even with VSS enabled. In any case, you can add an exclude pattern -*.vmem to .duplicacy/filters and try again.

I suppose this will work as a temporary solution. None of my other backup solutions have had a problem with this though. I don’t know how they handle it.

Is that a VMware workstation file? I’ll try it myself.

It is. Based on the file name it’s from where I took a snapshot of it. vmem is probably the snapshot of the ram since the machine was running.

How deflating

2017-05-25 15:37:28.912 INFO UPLOAD_PROGRESS Uploaded chunk 1056557 size 11039277, 51.90MB/s 01:18:36 95.2%
2017-05-25 15:37:29.321 INFO UPLOAD_PROGRESS Uploaded chunk 1056558 size 2101654, 51.90MB/s 01:18:36 95.2%
2017-05-25 15:37:30.374 INFO UPLOAD_PROGRESS Uploaded chunk 1056559 size 10895912, 51.90MB/s 01:18:36 95.2%
2017-05-25 15:37:30.793 INFO UPLOAD_PROGRESS Uploaded chunk 1056560 size 4828920, 51.90MB/s 01:18:36 95.2%
2017-05-25 15:37:31.161 INFO UPLOAD_PROGRESS Uploaded chunk 1056561 size 2174038, 51.90MB/s 01:18:36 95.2%
2017-05-25 15:37:31.553 INFO UPLOAD_PROGRESS Uploaded chunk 1056562 size 3060655, 51.90MB/s 01:18:36 95.2%
2017-05-25 15:37:34.836 ERROR UPLOAD_CHUNK Failed to find the path for the chunk d470effc0ed2eacb3f904e8dfa3ee3baddf93681dc2c5e25f5f2cf4fbe89dc4b: Get https://www.googleapis.com/storage/v1/b/[redacted]/o/[redacted]%2Fchunks%2Fd470effc0ed2eacb3f904e8dfa3ee3baddf93681dc2c5e25f5f2cf4fbe89dc4b?alt=json&projection=full: unexpected EOF

So this unexpected EOF needs to be caught and retried. This is likely a temporary error and may not happen if you run the backup again. However, I think it is really annoying that the backup has to start from the beginning even though it is faster than the actual uploading with all those skipped chunks. So the next major task for me is to implement fast resume. My plan is that it should be done by the end of next week.

I would also suggest that you create a separate repository for your virtual machines that backs up to a different storage location. For the storage you can set all three chunk size parameters to the same value. This will enable the fixed-size chunking algorithm which should be much faster than the variable-size chunking algorithm. We have a backup tool called Vertical Backup that does exactly that for VMWare ESXi.

Agreed. It went for 18 hours only to have a temporary internet outage so I had to start it again. Then it took over 24 hours just to run into this error at 95%. My server has been threatening me that it is going to restart for updates too… which I keep cancelling via command line.

I can create a separate backup set for my vm’s, but they don’t really change often. These are mostly backups of vm’s that I no longer use from an old version of vmware workstation that I got when I was in college. I might pull one down and fire it up to rebuild a discontinued piece of software that I still use or something, but for the most part they won’t change. Most are growing disks with linux on them, so they aren’t too big. It may not be a bad idea to pull them out into a separate backup set, but I have a feeling that once the initial backup is complete, it won’t really slow my backup too much even if I change one of them… unless you know something I don’t.

I still use vm’s for dev envs if I start a home project, so I will surely use a separate backup set for any of those (none right now), so it’s good to know to set them to a fixed size for speed. I don’t know if I want to use a separate piece of software though. Sounds like too much overhead for home use. Plus I’m not using esxi.

It finally finished. I started running it again. It looks like it is rechecking everything. Is this always going to be the case?

2017-05-27 22:34:49.190 INFO UPLOAD_PROGRESS Skipped chunk 298338 size 5996574, 70.00MB/s 15:12:37 26.1%
2017-05-27 22:34:49.245 INFO UPLOAD_PROGRESS Skipped chunk 298339 size 4719426, 70.00MB/s 15:12:37 26.1%
2017-05-27 22:34:49.267 INFO UPLOAD_PROGRESS Skipped chunk 298340 size 2166734, 70.00MB/s 15:12:37 26.1%
2017-05-27 22:34:49.293 INFO UPLOAD_PROGRESS Skipped chunk 298341 size 2612248, 70.00MB/s 15:12:37 26.1%
2017-05-27 22:34:49.360 INFO UPLOAD_PROGRESS Skipped chunk 298342 size 5740648, 70.00MB/s 15:12:37 26.1%
2017-05-27 22:34:49.425 INFO UPLOAD_PROGRESS Skipped chunk 298343 size 5551157, 70.00MB/s 15:12:36 26.1%
2017-05-27 22:34:49.467 INFO UPLOAD_PROGRESS Skipped chunk 298344 size 3867387, 70.00MB/s 15:12:36 26.1%
2017-05-27 22:34:49.479 INFO UPLOAD_PROGRESS Skipped chunk 298345 size 1379893, 70.00MB/s 15:12:36 26.1%
2017-05-27 22:34:49.551 INFO UPLOAD_PROGRESS Skipped chunk 298346 size 5980379, 70.00MB/s 15:12:36 26.1%
2017-05-27 22:34:49.577 INFO UPLOAD_PROGRESS Skipped chunk 298347 size 2464162, 70.00MB/s 15:12:36 26.1%
2017-05-27 22:34:49.656 INFO UPLOAD_PROGRESS Skipped chunk 298348 size 6188076, 70.00MB/s 15:12:35 26.1%
2017-05-27 22:34:49.673 INFO UPLOAD_PROGRESS Skipped chunk 298349 size 1274697, 70.00MB/s 15:12:35 26.1%
2017-05-27 22:34:49.698 INFO UPLOAD_PROGRESS Skipped chunk 298350 size 1805073, 70.00MB/s 15:12:35 26.1%
2017-05-27 22:34:49.766 INFO UPLOAD_PROGRESS Skipped chunk 298351 size 3877210, 70.00MB/s 15:12:35 26.1%

By default the backup command should skip files having the same timestamp AND size as the previous version. Once the second backup finishes, you should be able to see which files have been uploaded again. If the timestamp or size didn’t change but the file got uploaded again that it is a bug.

It took 22 hours for the second run…

2017-05-28 15:45:24.351 INFO BACKUP_STATS All chunks: 1106654 total, 5071G bytes; 166 new, 873,922K bytes, 233,652K bytes uploaded
2017-05-28 15:45:24.351 INFO BACKUP_STATS Total running time: 22:42:22

Are you using the -hash option? If so, Duplicacy will check every file regardless of their timestamps and sizes.

Ah. I did have that option in there. I’m running it from a batch file and completely overlooked that. User error. Ha.

Much better. I guess all I have left to do is schedule the backups and figure out how I can get e-mail notification of failures. That and check out the source code at some point. Thanks for all of your help.

2017-05-30 11:08:19.635 INFO BACKUP_STATS All chunks: 1106718 total, 5071G bytes; 69 new, 369,819K bytes, 77,936K bytes uploaded
2017-05-30 11:08:19.635 INFO BACKUP_STATS Total running time: 00:09:28