Memory Usage

CarlNasal · 31 July 2017 10:01

I am testing Duplicacy and Backblaze for Linux server backup to replace IDrive for Linux. It’s working well on smaller servers, but I have a server with 185GB of storage used, and it has 12GB of memory. When I run a backup, it often gets killed by OOM because it’s using so much memory. For example, I ran it last night and it peaked at 3.5GB of memory usage before it was killed. Is there anything I can do to make reduce memory usage?

FYI, the storage is encrypted, and I’m running the backup with “duplicacy backup -stats”

Thanks,
Carl

AlexJOST · 31 July 2017 13:53

I’m too having issues when trying to backup about 1 TB of data. Memory usage climbs up to 11-12 GB plus 8 GB of swap.

gchen · 1 August 2017 00:56

The memory usage is highly related to the number of files to be backed up. Duplicacy loads the entire file list into the memory during the indexing phase so you may run out of memory if there are too many files. But after the indexing phase the memory usage should stay flat.

Another factor that may dramatically increase the memory usage is extended attributes. When building the file list the extended attributes are also read into memory at the beginning. However, after the number of files exceeds a certain number (controlled by the environment variable DUPLICACY_ATTRIBUTE_THRESHOLD, which defaults to 1 million), Duplicacy will stop loading extended attributes during the indexing phase but instead will only read and upload them when preparing the final snapshot file.

So maybe setting DUPLICACY_ATTRIBUTE_THRESHOLD to a really small number, like 1, will help.

CarlNasal · 2 August 2017 13:36

I tried setting DUPLICACY_ATTRIBUTE_THRESHOLD to 1, and it did allow the process to finish, but it only used slightly less memory (the peak was around 3.2GB). Do you have plans to try to find ways to reduce memory usage when there’s a large amount of files to backup?

Thanks,
Carl

gchen · 3 August 2017 01:32

Definitely. There is no need to load the entire file list into memory at once. My plan is to construct the file list on the fly and upload file list chunks as soon as they have been generated. This will add significant complexity to the main backup loop, but in the long run should be worth it.

CarlNasal · 3 August 2017 20:14

That’s great to hear. Thanks for that information. I look forward to that update.

Carl

whereisaaron · 4 August 2017 03:04

After the file list phase, do any of the block size setting or other preferences control memory use?

Testing with default settings I was seeing about 100MB RAM allocated for the file list phases, and then during backup, pretty flat at 450MB RAM. That was more RAM than I want hoping to use on some clients.

gchen · 4 August 2017 17:16

The default average chunk size is 4MB, but it is the maximum chunk size (16 MB) that determines the size of buffers to be allocated, and there could be multiple buffers. If you set the average chunk size to 1MB when initializing the storage, the default maximum size will be 4MB, and that could reduce the memory footprint a bit.

Harnser · 21 October 2017 09:29

What’s the status of this issue? I have a 1.7TB backup via another provider’s software and want to switch to Duplicacy, but memory usage is definitely going to be important.

gchen · 22 October 2017 03:02

I haven’t got a chance to work on this. However, a 1.7 TB backup may not consume too much memory, if the number of files isn’t huge. I know several customers who back up millions of files totaling more than 10 TB.

vbman213 · 3 July 2018 03:39

Any progress on this?

I have a 800GB backup I’m trying to perform on Ubuntu 16.04 with 4GB of RAM and the backup is getting ‘Killed’

gchen · 3 July 2018 13:51

Sorry no progress so far. I’ll be focusing on the new web-based GUI in the next two months and hope to tackle this problem after that.

58eee6f9fff6050e82e1 · 7 February 2020 03:49

First, thanks for writing Duplicacy. I’ve had to suffer though many slow, difficult to use backup solutions in the past, whereas Duplicacy is quick and pretty much effortless.

I recently ran my first backup to S3, using the normal (non-RSA) encryption and a small filters list, with Duplicacy 2.3.0 on a Linux x86-64 system. By around 25% through the backup, the process was using ~1.1GiB RAM. By the end, it was using ~1.5GiB of RAM. The system barely had enough memory to finish the job.

The backup command was simply duplicacy backup -stats -threads 8. These were the final stats for the backup:

159003 total, 44,738M bytes; 159003 new, 44,738M bytes
File chunks: 9093 total, 44,738M bytes; 9084 new, 44,721M bytes, 42,754M bytes uploaded
Metadata chunks: 13 total, 51,830K bytes; 13 new, 51,830K bytes, 18,672K bytes uploaded
All chunks: 9106 total, 44,788M bytes; 9097 new, 44,772M bytes, 42,772M bytes uploaded

Assuming that memory use scales ~linearly with number of files, I don’t understand how anyone could back up millions of files without 16GiB or more RAM just to dedicate to the Duplicacy process alone. By my calculation this is ~10K of memory per backed up file. Does this seem correct?

Is this problem supposed to have been fixed already? If not, is it on the roadmap to be fixed in the next few months? Should I expect the same memory use on every incremental backup and prune operation?

Thank you again, and thanks for the response. Let me know if I can give any more information.

gchen · 7 February 2020 13:05

The number of files is only one factor. The number of threads is another. Moreover, Go being a garbage collection language can cause more memory to be used than what is needed. Therefore, you can’t simply extrapolate the actual memory usage from a small data set.

58eee6f9fff6050e82e1 · 7 February 2020 17:04

Thank you for a quick response! If this is a partially GC-related issue, does setting GOGC to a lower-than-default value help, in your experience? Also, how about those other questions? I have some additional follow-ups but I don’t want to bombard you with a bunch of extra questions since I am sure you are busy, and if the answer is “improvements are coming” then I don’t need to waste your time with them. Thanks!

gchen · 7 February 2020 20:11

I think GOGC should help, but I don’t know by how much.

The improvements on memory usage are planned but I haven’t really started working on it.

Questions are always welcome. For those related to memory usage, there isn’t a simple proportional function to predict, so the best way is to try it out yourself.

hiavi · 6 November 2020 15:02

@gchen I am also running into VirtualAlloc failure - see exception stack attached.
hiavi-DuplicacyMemoryExceptionDuringBackupAfter32.2percent.txt (11.5 KB)

One odd thing was that the allocation attempt was for zero bytes.

runtime: VirtualAlloc of 0 bytes failed with errno=1455
fatal error: runtime: failed to commit pages

My repository is ~2TB with ~553K files spread across ~52K folders.

So, what are my options here? - are they just

Retry with DUPLICACY_ATTRIBUTE_THRESHOLD set to 1
Break the repository into smaller subsets and backup them into same storage one after the other.

Is that it?

gchen · 7 November 2020 04:41

That is an out-of-memory error. Yes, you can try option 1 first and if that doesn’t help then option 2.

hiavi · 7 November 2020 23:35

Just FYI

I tried option # 1 on my Windows Machine with 16GB RAM - but still ran into the same issue.
Then tried it out MacBookPro - also with 16GB RAM and it ran smoothly. In both cases I used CLI.

In both the cases - I closed all other apps - but surely there might be services running in background resulting in different RAM available profile.

Has anyone observed significant memory usage differences between different OS platforms?

ajballa555 · 1 January 2021 01:54

Linux / unix seems to be more conservative on the memory