I have 3.5M chunks and duplicacy uses about 12GB of memory to check all the chunks. I’m not a Go expert, but I narrowed it down. It’s not actually checking the chunks that consumes all the memory; it’s the storage of the data that eventually becomes the “stats” in the log file.
Basically, there are 3 maps used: chunkSizeMap, chunkUniqueMap, and chunkSnapshotMap. Each of those uses the chunkId as the key. This means that the large chunkId string is stored in memory 3 times just to store maps of otherwise small variables (int64, bool, and int).
One simple solution is to store a map of chunkIds and a struct of size, unique, and snapshotId. This way the long chunkId is only stored once. There’s only one map. Some quick calculations show this should cut the memory usage of check in half.
I’m not a Go expert, so don’t want to submit a PR. I did take a shot at modifying the appropriate methods here, though: duplicacy_snapshotmanager.go · GitHub
@gchen I hope that helps. I’d love to reduce the amount of RAM I’m allocating to dpulicacy.
Thanks!