Cannot “check -chunks” — socket: too many open files

Please describe what you are doing to trigger the bug:
Following the advice in Cannot "check -chunks" -- fatal error: concurrent map read and map write - #8 by sevimo, I added -persist to my check command. This gets me a little farther along, but after a couple of days it eventually stops working as well.

Please describe what you expect to happen (but doesn’t):
Duplicacy continues to check chunks, making progress towards the goal.

Please describe what actually happens (the wrong behaviour):
Duplicacy eventually gets stuck, and the output is entirely “socket: too many open files” like this:

:53: socket: too many open files; retrying after 28.83 seconds
:53: socket: too many open files; retrying after 28.83 seconds
:53: socket: too many open files; retrying after 28.83 seconds
:53: socket: too many open files; retrying after 28.83 seconds
:53: socket: too many open files; retrying after 28.83 seconds
:53: socket: too many open files; retrying after 43.25 seconds
:53: socket: too many open files; retrying after 43.25 seconds

Full output: Dropbox - too-many-open-files.txt - Simplify your life

lsof|grep duplicacy|wc -l produces 10271.

It seems like duplicacy is leaking sockets or files or something. I noticed that Dropbox sends back a lot of 429 errors which causes Duplicacy to back off. Perhaps sockets or files are not being cleaned up properly in this or some other error scenario?

I also see some other interesting things in the output, like:

Chunk 6f1b07ab7abdce69301feb26c441152de1e0bd76ed6d6165624172258e16ab4d can't be found
Verified chunk 6f1b07ab7abdce69301feb26c441152de1e0bd76ed6d6165624172258e16ab4d (582925/3267815), 18.84MB/s 8 days 05:35:33 17.8%
:53: socket: too many open files; retrying after 0.50 seconds
:53: socket: too many open files; retrying after 0.75 seconds
:53: socket: too many open files; retrying after 1.12 seconds
:53: socket: too many open files; retrying after 0.50 seconds
:53: socket: too many open files; retrying after 0.75 seconds
:53: socket: too many open files; retrying after 1.12 seconds
Chunk b7505b5f57e85be58a7725597780a8e60cbeade066eb39d67b0eb5129b44fc0c can't be found
Verified chunk b7505b5f57e85be58a7725597780a8e60cbeade066eb39d67b0eb5129b44fc0c (582926/3267815), 18.84MB/s 8 days 05:35:35 17.8%
:53: socket: too many open files; retrying after 0.50 seconds
:53: socket: too many open files; retrying after 0.75 seconds
:53: socket: too many open files; retrying after 1.12 seconds
:53: socket: too many open files; retrying after 2.53 seconds
Chunk 6b603b4ad722d325a255ca8644ceb0e99c598863d5cad680df9f93b02c9dd4f5 can't be found
Verified chunk 6b603b4ad722d325a255ca8644ceb0e99c598863d5cad680df9f93b02c9dd4f5 (582927/3267815), 18.84MB/s 8 days 05:35:34 17.8%
:53: socket: too many open files; retrying after 0.50 seconds
:53: socket: too many open files; retrying after 0.75 seconds
:53: socket: too many open files; retrying after 1.12 seconds
:53: socket: too many open files; retrying after 1.69 seconds
:53: socket: too many open files; retrying after 1.69 seconds
:53: socket: too many open files; retrying after 1.69 seconds
:53: socket: too many open files; retrying after 1.69 seconds
:53: socket: too many open files; retrying after 2.53 seconds
Chunk 1df11682a8b5265794ec7a0da74539cec732b8be905f98201bfc5bd571738de4 can't be found
Verified chunk 1df11682a8b5265794ec7a0da74539cec732b8be905f98201bfc5bd571738de4 (582928/3267815), 18.84MB/s 8 days 05:35:38 17.8%

Surely these chunks cannot both be not found and verified at the same time.

I think I’ve found the problem. When retrying on 401/429/500, go-dropbox does not close the connection body, and eventually these leak. I can reproduce the problem pretty reliably using duplicacy check -chunks -threads 8 because Dropbox will ask the client to back off after a while, resulting in a significant number of 429’s. When this happens, lsof will report a lot of (half?) open sockets like this:

duplicacy 38553 38561 duplicacy            drz    8u     sock                0,9      0t0    3408912 protocol: TCPv6
duplicacy 38553 38561 duplicacy            drz    9u     sock                0,9      0t0    3408977 protocol: TCPv6
duplicacy 38553 38561 duplicacy            drz   12u     sock                0,9      0t0    3408904 protocol: TCPv6
duplicacy 38553 38561 duplicacy            drz   13u     sock                0,9      0t0    3407460 protocol: TCPv6

The number of these grows steadily over time.

Pull request to fix this: Close HTTP response when retrying on errors by northnose · Pull Request #5 · gilbertchen/go-dropbox · GitHub

After running it for a few hours with this fix, these sockets no longer appear in lsof. A smoking gun is that after this fix, when threads are stalled on 429’s (i.e. console is stuck on something like “POST https://content.dropboxapi.com/2/files/download returned 429; retrying after 300 seconds”), these sockets do appear in lsof, but as soon as the backoff expires (and thus the connection-close code added in the PR runs), they disappear.