Copy to cloud runs 7+ hours with mostly skipped chunks

Use Case / Issue

  • Backing up to local external drive and then performing a copy to Wasabi cloud storage
  • Local storage is currently 868GB
  • Initial copy to cloud took several weeks (which was expected)
  • New snapshot was then created and and another copy job started
  • The second copy job took over 7 hours to complete even though the vast majority of the time was spent skipping existing chunks

Questions

  • Is this behavior expected?
  • Is there any way to improve the performance?

Log Entries

01262018 21:39:33 : Starting work

Source storage set to /bkup2/duplicacy
Destination storage set to s3://us-east-1@s3.wasabisys.com/duplicacy-prod
Snapshot scylla-archive1 at revision 1 already exists at the destination storage
Snapshot scylla-archive1 at revision 2 already exists at the destination storage
Chunk 81f9c5e3b7ec91b8785f32537afeae069c3662ebae2d764bd7c43416fb1044ba (1/177530) skipped at the destination
.
.
(Approx 7hrs and 20MB of log entries later...)
.
.
Chunk 9164c64534242e925382699de088531f6717f7a26b6e95895b91bd949220bf6d (177525/177530) copied to the destination
Copy complete, 177530 total chunks, 4009 chunks copied, 173521 skipped
Copied snapshot scylla-archive1 at revision 3

01272018 04:52:12 : Completed work

Which version are you running? Version 2.0.10 has a few checkins that optimized the copy command. The issue as shown by your log was that it didn’t do a complete listing of existing chunks on the destination storage so it had to do a lookup for every chunk. I think this should have been fixed by 2.0.10.

I’m running v2.0.10 as downloaded from

https://github.com/gilbertchen/duplicacy/releases/download/v2.0.10/duplicacy_linux_x64_2.0.10

Sorry I misunderstood the log message and thought that a lookup to the storage was needed for each skipped chunk. If you run duplicacy -log copy... to enable the timing information you’ll see that skipping a chunk is almost instantaneous which means it is just checking the in-memory chunk cache.

You can speed it up by using the -threads option to increase the number of uploading threads, unless you already max out your unlink bandwidth with 1 thread.

I set the -log options and can see what you’re referring to. It looks like the job starts off with a fair amount of read activity. Is that Duplicacy building the chunk index?

I set the -threads option to 8 and the cloud copy job completed in less than two hours which is a vast improvement. Do you have any guidance on how to set the thread count to optimize the upload?

Log Entries from last cloud copy job:

282018 09:20:06 : Starting work

2018-01-28 09:20:06.020 INFO STORAGE_SET Source storage set to /bkup2/duplicacy
2018-01-28 09:20:06.069 INFO STORAGE_SET Destination storage set to s3://us-east-1@s3.wasabisys.com/duplicacy-prod
2018-01-28 09:20:07.135 INFO SNAPSHOT_EXIST Snapshot scylla-archive1 at revision 1 already exists at the destination storage
2018-01-28 09:20:07.285 INFO SNAPSHOT_EXIST Snapshot scylla-archive1 at revision 2 already exists at the destination storage
2018-01-28 09:20:07.324 INFO SNAPSHOT_EXIST Snapshot scylla-archive1 at revision 3 already exists at the destination storage
2018-01-28 09:23:43.810 INFO SNAPSHOT_COPY Chunk acf42232155050d5d7ed5623cc89047b97b0197906c232b1c2c646eae1a043cd (1/177516) skipped at the destination
.
.
.
2018-01-28 11:06:54.445 INFO SNAPSHOT_COPY Chunk 86f2237e2f56f9aff92800929684522ad06fca61c0011d4403c0bfc7a66ac53b (176804/177516) copied to the destination
2018-01-28 11:06:54.480 INFO SNAPSHOT_COPY Copy complete, 177516 total chunks, 1256 chunks copied, 176260 skipped
2018-01-28 11:06:55.136 INFO SNAPSHOT_COPY Copied snapshot scylla-archive1 at revision 4

01282018 11:06:55 : Completed work

I notice that the latest job copied only 1256 chunks which was about a fourth of the previous one, so I am not sure if the improvement was a result of using 8 threads or not.

You can create a random small file (like 1GB in size) and then upload the file with different numbers of threads to figure out the optimal value. For example:

mkdir -p /path/to/test/repository
cd /path/to/test/repository
duplicacy init test s3://us-east-1@s3.wasabisys.com/bucket
dd if=/rand/urandom of=1G bs=1m count=1024
duplicacy backup -stats -threads 8
duplicacy prune -r 1 -exclusive

The last command was to remove uploaded chunks quickly (otherwise it would take over to delete the chunks directory from their website). You can then proceed to remove the config file from the storage, and the .duplicacy folder from the repository, and run the same set of commands but with a different number of threads.

Okay, I ran the test as specified and the results were not so encouraging. I seem to hit a ceiling after adding the second thread. Adding more threads did not improve the performance. I’m on a Uverse DSL connection and wonder if I might be hitting the ceiling on my upload bandwidth but I have not verified that.

Note: I did not remove the config file and .duplicacy directories from the repository between backup jobs. Would that have any impact on the results?

Selected log entries:

(starting with one thread)
Storage set to s3://us-east-1@s3.wasabisys.com/dup-test
No previous backup found
Indexing /bkup/cloudtest
Listing all chunks
Uploaded chunk 1 size 12521402, 408KB/s 00:50:32 0.9%
.
.
Uploaded chunk 246 size 10085453, 423KB/s 00:00:01 100.0%
Uploaded 300.mp4 (1277703891)
Uploaded nohup.out (356)
Uploaded preferences (389)
Backup for /bkup/cloudtest at revision 1 completed
Files: 3 total, 1,218M bytes; 3 new, 1,218M bytes
File chunks: 246 total, 1,218M bytes; 246 new, 1,218M bytes, 1,220M bytes uploaded
Metadata chunks: 3 total, 18K bytes; 3 new, 18K bytes, 17K bytes uploaded
All chunks: 249 total, 1,218M bytes; 249 new, 1,218M bytes, 1,220M bytes uploaded
Total running time: 00:49:13
Storage set to s3://us-east-1@s3.wasabisys.com/dup-test
Deleting snapshot test at revision 1
The snapshot test at revision 1 has been removed
=====================================
Storage set to s3://us-east-1@s3.wasabisys.com/dup-test
No previous backup found
Indexing /bkup/cloudtest
Listing all chunks
Use 2 uploading threads
Uploaded chunk 2 size 1650268, 403KB/s 00:51:33 0.1%
.
.
Uploaded chunk 246 size 10085453, 565KB/s 00:00:01 100.0%
Uploaded 300.mp4 (1277703891)
Uploaded nohup.out (356)
Uploaded preferences (389)
Backup for /bkup/cloudtest at revision 1 completed
Files: 3 total, 1,218M bytes; 3 new, 1,218M bytes
File chunks: 246 total, 1,218M bytes; 246 new, 1,218M bytes, 1,220M bytes uploaded
Metadata chunks: 3 total, 18K bytes; 3 new, 18K bytes, 17K bytes uploaded
All chunks: 249 total, 1,218M bytes; 249 new, 1,218M bytes, 1,220M bytes uploaded
Total running time: 00:36:49
Storage set to s3://us-east-1@s3.wasabisys.com/dup-test
Deleting snapshot test at revision 1
The snapshot test at revision 1 has been removed
=====================================
Storage set to s3://us-east-1@s3.wasabisys.com/dup-test
No previous backup found
Indexing /bkup/cloudtest
Listing all chunks
Use 3 uploading threads
Uploaded chunk 2 size 1650268, 269KB/s 01:17:20 0.1%
.
.
Uploaded chunk 246 size 10085453, 568KB/s 00:00:01 100.0%
Uploaded 300.mp4 (1277703891)
Uploaded nohup.out (356)
Uploaded preferences (389)
Backup for /bkup/cloudtest at revision 1 completed
Files: 3 total, 1,218M bytes; 3 new, 1,218M bytes
File chunks: 246 total, 1,218M bytes; 246 new, 1,218M bytes, 1,220M bytes uploaded
Metadata chunks: 3 total, 18K bytes; 3 new, 18K bytes, 17K bytes uploaded
All chunks: 249 total, 1,218M bytes; 249 new, 1,218M bytes, 1,220M bytes uploaded
Total running time: 00:36:38
Storage set to s3://us-east-1@s3.wasabisys.com/dup-test
Deleting snapshot test at revision 1
The snapshot test at revision 1 has been removed
=====================================
Storage set to s3://us-east-1@s3.wasabisys.com/dup-test
No previous backup found
Indexing /bkup/cloudtest
Listing all chunks
Use 4 uploading threads
Uploaded chunk 2 size 1650268, 179KB/s 01:56:00 0.1%
.
.
Uploaded chunk 246 size 10085453, 568KB/s 00:00:01 100.0%
Uploaded 300.mp4 (1277703891)
Uploaded nohup.out (356)
Uploaded preferences (389)
Backup for /bkup/cloudtest at revision 1 completed
Files: 3 total, 1,218M bytes; 3 new, 1,218M bytes
File chunks: 246 total, 1,218M bytes; 246 new, 1,218M bytes, 1,220M bytes uploaded
Metadata chunks: 3 total, 18K bytes; 3 new, 18K bytes, 17K bytes uploaded
All chunks: 249 total, 1,218M bytes; 249 new, 1,218M bytes, 1,220M bytes uploaded
Total running time: 00:36:37
Storage set to s3://us-east-1@s3.wasabisys.com/dup-test
Deleting snapshot test at revision 1
The snapshot test at revision 1 has been removed
=====================================
(And so on up to 9 threads.  The upload time did not improve)
=====================================
Storage set to s3://us-east-1@s3.wasabisys.com/dup-test
No previous backup found
Indexing /bkup/cloudtest
Listing all chunks
Use 9 uploading threads
Uploaded chunk 6 size 1588435, 82KB/s 04:14:25 0.1%
.
.
Uploaded chunk 246 size 10085453, 558KB/s 00:00:01 100.0%
Uploaded 300.mp4 (1277703891)
Uploaded nohup.out (356)
Uploaded preferences (389)
Backup for /bkup/cloudtest at revision 1 completed
Files: 3 total, 1,218M bytes; 3 new, 1,218M bytes
File chunks: 246 total, 1,218M bytes; 246 new, 1,218M bytes, 1,220M bytes uploaded
Metadata chunks: 3 total, 18K bytes; 3 new, 18K bytes, 17K bytes uploaded
All chunks: 249 total, 1,218M bytes; 249 new, 1,218M bytes, 1,220M bytes uploaded
Total running time: 00:37:18
Storage set to s3://us-east-1@s3.wasabisys.com/dup-test
Deleting snapshot test at revision 1
The snapshot test at revision 1 has been removed

So the upload speed peaked at 565KB/s, which should be normal for a uplink of 4-5MB/s.

You can run a speed test on http://www.speedtest.net/result/7018670339.

Bingo. I’m maxing the upload at around 5Mbps. Google Fiber? Still waiting…