Memory Usage

gchen · 3 July 2018 13:51

Sorry no progress so far. I’ll be focusing on the new web-based GUI in the next two months and hope to tackle this problem after that.

58eee6f9fff6050e82e1 · 7 February 2020 03:49

First, thanks for writing Duplicacy. I’ve had to suffer though many slow, difficult to use backup solutions in the past, whereas Duplicacy is quick and pretty much effortless.

I recently ran my first backup to S3, using the normal (non-RSA) encryption and a small filters list, with Duplicacy 2.3.0 on a Linux x86-64 system. By around 25% through the backup, the process was using ~1.1GiB RAM. By the end, it was using ~1.5GiB of RAM. The system barely had enough memory to finish the job.

The backup command was simply duplicacy backup -stats -threads 8. These were the final stats for the backup:

159003 total, 44,738M bytes; 159003 new, 44,738M bytes
File chunks: 9093 total, 44,738M bytes; 9084 new, 44,721M bytes, 42,754M bytes uploaded
Metadata chunks: 13 total, 51,830K bytes; 13 new, 51,830K bytes, 18,672K bytes uploaded
All chunks: 9106 total, 44,788M bytes; 9097 new, 44,772M bytes, 42,772M bytes uploaded

Assuming that memory use scales ~linearly with number of files, I don’t understand how anyone could back up millions of files without 16GiB or more RAM just to dedicate to the Duplicacy process alone. By my calculation this is ~10K of memory per backed up file. Does this seem correct?

Is this problem supposed to have been fixed already? If not, is it on the roadmap to be fixed in the next few months? Should I expect the same memory use on every incremental backup and prune operation?

Thank you again, and thanks for the response. Let me know if I can give any more information.

gchen · 7 February 2020 13:05

The number of files is only one factor. The number of threads is another. Moreover, Go being a garbage collection language can cause more memory to be used than what is needed. Therefore, you can’t simply extrapolate the actual memory usage from a small data set.

58eee6f9fff6050e82e1 · 7 February 2020 17:04

Thank you for a quick response! If this is a partially GC-related issue, does setting GOGC to a lower-than-default value help, in your experience? Also, how about those other questions? I have some additional follow-ups but I don’t want to bombard you with a bunch of extra questions since I am sure you are busy, and if the answer is “improvements are coming” then I don’t need to waste your time with them. Thanks!

gchen · 7 February 2020 20:11

I think GOGC should help, but I don’t know by how much.

The improvements on memory usage are planned but I haven’t really started working on it.

Questions are always welcome. For those related to memory usage, there isn’t a simple proportional function to predict, so the best way is to try it out yourself.

hiavi · 6 November 2020 15:02

@gchen I am also running into VirtualAlloc failure - see exception stack attached.
hiavi-DuplicacyMemoryExceptionDuringBackupAfter32.2percent.txt (11.5 KB)

One odd thing was that the allocation attempt was for zero bytes.

runtime: VirtualAlloc of 0 bytes failed with errno=1455
fatal error: runtime: failed to commit pages

My repository is ~2TB with ~553K files spread across ~52K folders.

So, what are my options here? - are they just

Retry with DUPLICACY_ATTRIBUTE_THRESHOLD set to 1
Break the repository into smaller subsets and backup them into same storage one after the other.

Is that it?

gchen · 7 November 2020 04:41

That is an out-of-memory error. Yes, you can try option 1 first and if that doesn’t help then option 2.

hiavi · 7 November 2020 23:35

Just FYI

I tried option # 1 on my Windows Machine with 16GB RAM - but still ran into the same issue.
Then tried it out MacBookPro - also with 16GB RAM and it ran smoothly. In both cases I used CLI.

In both the cases - I closed all other apps - but surely there might be services running in background resulting in different RAM available profile.

Has anyone observed significant memory usage differences between different OS platforms?

ajballa555 · 1 January 2021 01:54

Linux / unix seems to be more conservative on the memory

crimue · 10 February 2021 21:27

Hello, I also just got this issue (7-9TB, +2 million files, 4GB RAM, Debian). I am using the web UI. Where do I enter the “DUPLICACY_ATTRIBUTE_THRESHOLD”? I tried to add it in to the globals and options in the backup section but this disables the backup process…

I am really happy with Duplicacy (works perfectly on a 8GB RAM Debian machine)! Thank you very much!

gchen · 11 February 2021 04:32

There is no easy way to set the environment variable for the CLI from the web GUI. Your best option might be to divide the big backup job into several small ones, by using different sets of filters for each smaller job.

crimue · 11 February 2021 10:10

Ok, thank you for your answer. So I have to set up 10 different backups for the subfolders to split it up? Does this reduce the deduplication?

Is there any plan/timeline of fixing this? This problem exists now for over 4 years.

Thank you very much!

gchen · 11 February 2021 19:36

I’m planning a big rewrite of the backup engine and hope to get it done in 2 months.

crimue · 11 February 2021 19:52

Awesome! If you need testers feel free to send me a message! Thank you very much!

350c183cd72f8a081539 · 29 August 2021 00:44

I too am having memory issues while backing up a large number of files, currently on Web Edition 1.5.0. I understand from a post a few years ago, that you’re working on changing the architecture:

There is no need to load the entire file list into memory at once. My plan is to construct the file list on the fly and upload file list chunks as soon as they have been generated.

Has the rewrite been released yet?

Update: This affects Restore operations as well. If a backup took 64GB of RAM to run, it also seems to take 64GB of RAM to load the file list during a restore. Is this something that’s still being actively worked on, or is the enhancement regarding loading the entire file list into memory already live?

gchen · 1 September 2021 03:55

I’ve finished all code changes and is now working on the tests. Sorry for postponing the release multiple times. This summer has been really slow for me, but as kids are going back to school very soon, September will look much better. There should be enough time to finish testing and release the new version by the end of September.

Spike · 17 October 2021 22:15

In case I can add another use case, I’m trying to run Duplicacy as a Docker container on a Synology that unfortunately has only 1GB of RAM and I haven’t been able to pass the Indexing phase of a backup as it ran OOM (I think, I can only see “signal: killed” in the logs).
So not saving the list of files in memory it could be a great improvement for me!

Looking forward to the new update and thanks for your effort!

saspus · 17 October 2021 23:10

That’s a bad use case though. 1GB is barely enough to run NAS services alone, much less other apps. Using up that ram with non-essential service like backup forces disk cache eviction which completely murders any hope of getting any sort of responsiveness from the storage subsystem. I would strongly suggest getting another compute appliance to run applications to leave RAM on then for for disk cache. There are many other reasons why its a bad idea to run third party apps on a storage appliance without ECC RAM (row hammer comes to mind)

If you do - at least get rid of docker. It’s completely unnecessary and yet it eats up ram for itself and the whole, albeit small, usermode environment. Duplicacy can run natively on any but two synology diskstations. (those exceptions are two PowerPC models). In other words – anything you can do to increase amount of unused ram will do wonders for filesystem performance.

vegivamp · 4 March 2022 10:15

Any idea when that will be available? I’m running into the memory issue on multiple systems. Is there a prebuilt beta available I could use instead?

gchen · 4 March 2022 15:15

The PR has been submitted: Memory usage optimization

If you need a binary I can build one for you.