Suggested fix for massive memory usage in "check"

I have 3.5M chunks and duplicacy uses about 12GB of memory to check all the chunks. I’m not a Go expert, but I narrowed it down. It’s not actually checking the chunks that consumes all the memory; it’s the storage of the data that eventually becomes the “stats” in the log file.

Basically, there are 3 maps used: chunkSizeMap, chunkUniqueMap, and chunkSnapshotMap. Each of those uses the chunkId as the key. This means that the large chunkId string is stored in memory 3 times just to store maps of otherwise small variables (int64, bool, and int).

One simple solution is to store a map of chunkIds and a struct of size, unique, and snapshotId. This way the long chunkId is only stored once. There’s only one map. Some quick calculations show this should cut the memory usage of check in half.

I’m not a Go expert, so don’t want to submit a PR. I did take a shot at modifying the appropriate methods here, though: duplicacy_snapshotmanager.go · GitHub

@gchen I hope that helps. I’d love to reduce the amount of RAM I’m allocating to dpulicacy.

Thanks!

6 Likes

A map that contains 3.5M chunk hashes should not use more than 1GB memory. Besides, strings in Go are immutable so I believe they should be reused across multiple maps.

Can you make sure that it is not this bug: Clear the loaded content after a snapshot has been verified · gilbertchen/duplicacy@5d45999 · GitHub (which was fixed in 2.6.0)?