Copy command slow between separate internal drives?

I’ve created a second storage location for a repo on a second disk in the same machine (reorganizing disks) and am using the copy command to clone the data over but it seems much slower than it should. Getting around 20 MB/s sometimes closer to 30 if I’m lucky. Am I crazy or should this not be a lot faster based on the hardware / setup below?

Copied chunk 7bdf2151a594977e372574439f4873b514881649a562397f87d23922275c5202 (101/394334) 21.47MB/s 20:39:38 0.0%
Copied chunk 2cd39a1a842563cfb71fd11a5ee24b39b7b5c0c252a01e5c9cf80a587a2266ea (93/394334) 22.19MB/s 22:33:21 0.0%
Copied chunk 5bc24e48276570bd1f5741c21e01aa29a4a76bf86dece8e662c4b521d958ddd2 (102/394334) 22.09MB/s 20:44:14 0.0%
Copied chunk 7a7f09ef948a3d691177212320f8092b6217f51c8fc217605c973726f08ba456 (99/394334) 22.28MB/s 21:32:10 0.0%
Copied chunk 9d58c39d0f2a9fae166eae3b9539570c184e0d768f81429b982b85ce58ad95d6 (108/394334) 21.45MB/s 20:34:24 0.0%
Copied chunk ff2333cd804a617ac8967c2ca32c44868efb6581a05ca80fc09bd12db75f3a08 (104/394334) 21.17MB/s 21:57:25 0.0%
Copied chunk 5ceca5a158b52132d1f8504676fbc6ff661f02fe4dd85b03fb822d31ffbfa145 (103/394334) 20.90MB/s 22:55:39 0.0%
Copied chunk 92fb92d5966181f6991377e990f1510ea5222aebd10628eb950c9d298f04882a (110/394334) 20.99MB/s 21:37:09 0.0%
Copied chunk 0cedfdb326de5dc00224a6cedf3b17da85c669f454b04e3c770d8b596d66bf70 (109/394334) 21.16MB/s 21:53:55 0.0%

Source:

  • HDD 7200 rpm disk that gets read speeds over 200 MB/s in a dd test
  • Unencrypted duplicacy storage with default settings for chunk size and everything else

Target:

  • Unraid HDD array in the same machine that gets write speeds just under 200 MB/s at the same time as the read test is running (so I know it’s not a throughput issue on the SATA controller)
  • Encrypted duplicacy storage with that had -copy param set to the source storage above.

CPU:
Intel® Core™ i3-7100 CPU @ 3.90GHz and is usage is less than 50% while the copy is running

Command:
duplicacy copy -threads 8 -from default -to INT_BACKUP_ENC_TEMP
(aware 8 threads won’t do anything on a 4 thread machine but was kind of grasping at straws)

# cat /mnt/user/archive/.duplicacy/preferences 
[
    {
        "name": "default",
        "id": "archive",
        "repository": "",
        "storage": "/mnt/disks/INT_BACKUP/UNRAID_BACKUPS_DUPLICACY/archive_and_ReplicateOut",
        "encrypted": false,
        "no_backup": false,
        "no_restore": false,
        "no_save_password": false,
        "nobackup_file": "",
        "keys": null,
        "filters": "",
        "exclude_by_attribute": false
    },
    {
        "name": "INT_BACKUP_ENC_TEMP",
        "id": "archive",
        "repository": "",
        "storage": "/mnt/user/local/INT_BACKUP_ENC_TEMP",
        "encrypted": true,
        "no_backup": false,
        "no_restore": false,
        "no_save_password": false,
        "nobackup_file": "",
        "keys": null,
        "filters": "",
        "exclude_by_attribute": false
    }
]

Testing:

  • Nothing else is using the disks outside of duplicacy
  • rsync can copy at a sustained rate around 165 MB/s (large file) or 80 MB/s a bunch of smaller / medium files like documents, etc.

Finally… Is it strictly due to the many small chunk files? Even considering that it seems quite slow since at one point I was copying chunk files to another machine across the network at 50-75 MB/s.

Honestly, 20MB/s isn’t too crazy bad but all those extra threads on local storage is probably strangling things. On a local storage (HDD) to local storage (HDD), I normally choose just 1 thread because otherwise you’re gonna have a lot of disk thrashing and the parallelism will have more overhead than it seeks to overlap. Try 1 or 2 threads and see how it goes.

With SSD it’d be different, but you still gotta remember a Duplicacy copy job is doing more than just shifting bits - it’s unpacking and repacking each chunk.

At the moment I get around 40MB/s over LAN using 4 threads (TBH I’d normally use 2 for ssh normally, but this NAS box was originally off-site and cba to reconfigure)… on a 9th gen i3.

Remember, once the initial copy is done, subsequent copies should be quicker (although if you have a LOT of chunks, the scan phase can take a bit longer). Copies are the sorts of things that’s best automated and left to complete on its own time.

Edit: If you absolutely need faster, make a copy-compatible storage with -bit-identical and you can rsync/rclone sync instead, but then you’d lose the benefit of the extra integrity checks and being able to copy a subset of snapshots or have differently pruned revisions - you’d then want to run regular check -chunks.

Thanks. I was originally at 4 threads to match the CPU but 8, 2, or even 1 gives basically the same results. Though 1 thread seems to be a bit more consistent. Currently maxed out at about 27 MB/s with 1 thread whereas high numbers of threads would start round 30 MB/s give or take and after a long period seemed to settle around 18 MB/s. Is what it is, I guess.

The speed more so just an annoyance since I’m doing the initial copy for each of my backup drives / devices describe in this thread and I’d like to be done with this so I can move onto other parts of the project or other projects in general. Waiting nearly 24 hours for a single copy to be made before I can move onto the next step sucks. A very ‘hurry up and wait’ situation. Oh well.

Thanks again for your input and detailed response!

Related question(s) just came up…

One of the reasons I was doing the copy was to do a sort of pseudo re-init of the storage on a stand alone drive since I wanted to add encryption and erasure coding. The plan was to copy stand alone HDD → Main array (w/ encryption enabled) → stand alone HDD (w/encryption & EC enable). I naively assumed that this would be faster that re-creating the whole backup from scratch on the stand alone drive, since I didn’t realize the overhead needed for the copy command in this case (mostly the rehashing / encryption, which now makes sense in hindsight?).

More importantly, somewhat ironically related (re: erasure coding, I think), is that the copy operation keeps hitting this…

The chunk 81e0380225980e2fc167949c032822d9b056fd5e952d51d0f83b7d1e411b342e has a hash id of c48a120476906ce6c16e0f30f0f2cb3b8b8a11e1855adcc076f4a08eeb07fafa; retrying

That combine with the copy overhead means I should probably just wipe the stand alone drive and re-create the backup from scratch.

But it has me wondering. What if this same hash id issue happens when copying from the “main” backup to my off-site backup in the future? What’s the recommended process in that scenario? Something like this or is there a better process?

One follow up. I now ran into this too, which seems odd considering the source storage isn’t encrypted, but the destination is. Why would it say it failed to decrypt if it should just be doing encryption to the new storage? Or does it mean decompress?

Copied chunk 7e284088254abeaf9357a05112fd9ee8f6924580f1e4a0564e47f1f78be61127 (5089/73895) 24.62MB/s 03:52:03 6.9% Copied chunk c1ebaf309d9647d29151aaa39ce6032901648171fd012f4f9c78a070bddc6da8 (5090/73895) 24.62MB/s 03:52:01 6.9% Failed to decrypt the chunk 11ea2cedca15a76bea9630c5f769682a3d0cafaebe1568ed9d2ae77021dff460: zlib: invalid header; retrying

Never encountered such errors before.

Did you start from scratch when trying to change encryption and erasure coding on a storage? No idea how that would be possible without deleting the config file though - maybe you had a mix of erasure-coded and non-encoded chunks left over? That would be my guess.

@gchen Any idea about this?