Duplicacy Web UI hangs forever

Please describe what you are doing to trigger the bug:

I was using Duplicacy to back up a Synology NAS, through the web UI. I opened a lot of log tail requests.

Please describe what you expect to happen (but doesn’t):

The web UI should respond.

Please describe what actually happens (the wrong behaviour):

Requests to the web UI hang forever (e.g. /, /backup, /storage); except for homepage requests when logged out, which return immediately HTTP 200.

I have investigated this issue a bit, attached a debugger, and found the requests are blocked trying to acquire the mutex main.ConfigLock. Goroutines keep backing up with each new request. I’m not sure which thread holds the ConfigLock though. I documented my investigation in a twitter thread: https://twitter.com/MarkHnsn/status/1621383867088404481.

I think that when the browser closes the “log tail” tab, closing its connection, the log-tail goroutine should exit too. It looks like they’re leaking.

Version: 1.6.3.

Open http://host:3875/print_stack_trace in the browser and you’ll see the full stack trace in the log file ~/.duplicacy-web/logs/duplicacy_web.log.

I believe this handler won’t block on ConfigLock.

The log tail handler, show_log, does not acquire ConfigLock either. So I’m not sure which handler does that (and never returns). The full stack trace should give us a hint.

I restarted my duplicacy so I don’t have a running reproduction case right now (but I’ll run those commands above if this recurs).

Luckily, I found this goroutines -t stack trace dump of all goroutines in my terminal scrollback buffer:

https://drive.google.com/file/d/1OArFJHe-OdO_hViD5DFQVLe7IQ8b6tEj/view?usp=sharing

(sorry, I can’t upload an attachment directly as a new user)

Goroutine 6806 is the culprit:

  Goroutine 6806 - User: /Users/gchen/zincbox/go/src/github.com/gilbertchen/duplicacy-web/dwe/dwe_backup.go:357 main.startStopBackupHandler (
0xa7bb4b) [chan send 449579507510816]
         0  0x0000000000437cb6 in runtime.gopark
             at /usr/local/go/src/runtime/proc.go:367
         1  0x00000000004057c5 in runtime.chansend
             at /usr/local/go/src/runtime/chan.go:257
         2  0x000000000040537d in runtime.chansend1
             at /usr/local/go/src/runtime/chan.go:143
         3  0x0000000000a7bb4b in main.startStopBackupHandler
             at /Users/gchen/zincbox/go/src/github.com/gilbertchen/duplicacy-web/dwe/dwe_backup.go:357
         4  0x00000000006bb56f in net/http.HandlerFunc.ServeHTTP
             at /usr/local/go/src/net/http/server.go:2047
         5  0x00000000006bd329 in net/http.(*ServeMux).ServeHTTP
             at /usr/local/go/src/net/http/server.go:2425
         6  0x0000000000a57995 in main.logHTTPRequest.func1
             at /Users/gchen/zincbox/go/src/github.com/gilbertchen/duplicacy-web/dwe/dwe_main.go:867
         7  0x00000000006bb56f in net/http.HandlerFunc.ServeHTTP
             at /usr/local/go/src/net/http/server.go:2047
         8  0x00000000006bef9b in net/http.serverHandler.ServeHTTP
             at /usr/local/go/src/net/http/server.go:2879
         9  0x00000000006ba648 in net/http.(*conn).serve
             at /usr/local/go/src/net/http/server.go:1930
        10  0x00000000006bf8ee in net/http.(*Server).Serve·dwrap·87
             at /usr/local/go/src/net/http/server.go:3034

It owned the ConfigLock in question but was blocked on a channel:

repository.Job.StopChannel <- true

I’m not exactly sure what this channel blocked. It could be the channel became nil due to a race condition, or there were multiple simultaneous attempts to stop the backup (the channel capacity was 1). In either case, the issue can be simply fixed by adding a timeout here.

Your outstanding analysis has made my job really easy, thank you!

Thanks, glad it was helpful. Thank you for writing duplicacy, I evaluated a lot of backup software and it looks like yours is the best in the space. I wrote a recommendation: Cloud Backup Software.

I just hope adding a timeout won’t mask a latent bug (race condition)? I sympathise with the challenge of making correct go channels; they are fairly tricky to use correctly. I feel like Go made a huge deal about making concurrency easy, but channels have a lot of sharp edges. Good luck with the fix!

Just in case it got lost above, I also think you have a goroutine leak in the log-tail handler; I think you could return from these goroutines after the browser closes the connection. That’s probably less important (as it doesn’t hurt the user experience much) but just a resource-saving thing.

repository.Job.StopChannel ← true

maybe there were multiple simultaneous attempts to stop the backup

Yes, this happened. I clicked the ‘stop’ button in the browser on a backup, and nothing happened. So I clicked ‘stop’ again. It just kept changing from a ‘stop’ icon to ‘playing’ and the progress bar didn’t stop. You know how sometimes you Ctrl-C a program and it keeps running so you press Ctrl-C ten times and it stops? I might have even clicked it ten times.

I think you’ll probably want to be robust to this? I think it’s common for browsers to retry operations.

I also think you have a goroutine leak in the log-tail handler; I think you could return from these goroutines after the browser closes the connection

http.ResponseWriter has no close-notification, but http.CloseNotifier and http.Request.Cancel, however they’re deprecated in favor of selecting on the http.Request.Context().Done() channel.