Prune (Dry Run) Memory Usage

brian · 23 July 2018 04:18

I currently have 4 repositories on my NAS that all back up to B2. The NAS is an i3 with 4 GB RAM, running a linux variant. The repos are as follows, with approximate info from duplicacy check -tabular. There are about 150-175 revisions in each repo (they all run nightly).

files: 5,900 chunks, 7 GB
music: 424,000 chunks, 522 GB
videos: 19,000 chunks, 22 GB
photos: 1,300,000 chunks, 1,500 GB

Thus far I have not run prune at all; I didn’t get around to setting up the scripts/scheduling of it before, and I am attempting to do so now. Since I am just getting prune set up, I am running it with -dry-run, but 3 times in a row, my system has full on crashed, and my guess is that duplicacy consumes all the available RAM (the process shows to be using 1.6 GB, 90+% memory utilized) and it just runs out of resources. Under normal load the system’s memory is about 20-30% utilized.

Here is the command that I am running:

duplicacy -log -verbose -debug prune -dry-run -all -storage b2 -keep 0:360 -keep 30:180 -keep 7:30 -keep 1:7

and here are the last 4 lines from the log file (most of the entries are DOWNLOAD_FETCH and CHUNK_CACHE):

2018-07-22 20:02:36.722 DEBUG DOWNLOAD_FETCH Fetching chunk 9e606c48570d5dc71309c3ef155872cf504461ece423993baa68a49a95f74031
2018-07-22 20:02:37.829 DEBUG CHUNK_CACHE Chunk 9e606c48570d5dc71309c3ef155872cf504461ece423993baa68a49a95f74031 has been loaded from the snapshot cache
2018-07-22 20:02:37.832 DEBUG DOWNLOAD_FETCH Fetching chunk 1ec49414357d5ec7a4e09d161c2fa595959fc2520ea8cc5177fde75a8c5b5291
2018-07-22 20:02:39.985 DEBUG CHUNK_CACHE Chunk 1ec49414357d5ec7a4e09d161c2fa595959fc2520ea8cc5177fde75a8c5b5291 has been loaded from the snapshot cache

So, before I start digging through system logs or buying more RAM (I’d probably add another 8 GB for 12 total), I just want to ask: for a backup of this size, does 4 GB of memory seem reasonable, or could there be an issue with how dry run is implemented (which consumes more memory since it isn’t saving/writing anything), and if so, should I just do the prune for real (would it be as intensive)? Should I try breaking it up to only do one repo at a time (using the -id <snapshot id> param, if I understand it correctly? If it is normal to consume that much memory for a prune, is it only because I haven’t done one yet against this storage, and if I get it completed, subsequent runs of prune will take less, or should that usage be expected for each run? I can run it manually from a computer with more memory as a test as well. Is there any other info that would be helpful?

Thanks!

Droolio · 23 July 2018 13:43

I, too, ran out of memory very recently when running a prune job. Ran it again and it completed, so didn’t think much of it at the time. Having said that, I am experiencing a separate issue with prune (but fairly sure its unrelated).

Rather than altogether, would running prune with each -keep option independently do the same job? One at a time.

gchen · 23 July 2018 17:23

This PR may significantly reduce the amount of memory needed by the prune command. It was merged after the release of 2.1.0 so you’ll need to build from the latest source on the master branch.

brian · 23 July 2018 17:39

Thanks! I’ll give it a try as time allows and report back any changes in behavior. I forgot to mention in my original post, but I am running version 2.1.0.

brian · 25 July 2018 01:46

Good news! I switched to version 2.1.1, and running the same duplicacy prune command, the total system memory usage never went above 65% (vs maxing out), and the entire (dry run) operation completed in about 17 minutes. Previously, the system would crash after prune was running for about 10 minutes. Thank you!