Help with RAM usage for 30tb folder

2thdoc15 · 26 December 2019 03:15

Hello,
I just got duplicacy and running it in a docker in unraid. I am backing up to minio S3 and I was able to set that up.

My issue is that I just started backing up a library and I see the duplicacy docker is just eating through the RAM! Im pretty certain it will use u all the RAM before the backup actually finishes. It is very slowly increasing as time goes on… I’m not sure what will happen when it runs out of RAM, does it just start over? I’m not sure what to do. Any suggestions?

leerspace · 26 December 2019 17:51

Related thread:

I think the current workaround for users with a large number of files in a single repository is to split it up into multiple smaller repositories using filters, or by setting up subdirectories as their own repositories.

ephekt · 26 December 2019 22:47

Thanks for posting this question. I’ve been having a hard time with Google Drive and large files. I initiated my backup not in verbose so I barely see anything in the logs. Going to try the workaround to split into different backups!

2thdoc15 · 26 December 2019 23:01

It’s about 50,000 files. Not in the millions like I’ve seen people post about.

Also, it indexes fine in the beginning but it’s just slowly kreeping up throughout the backup as it’s transferring the files.

gchen · 27 December 2019 01:49

Usually it is the file indexing phase that consumes the most memory to store the file list. In your case, since you only have that many files, it must be the list of existing chunk hashes that causes the memory usage to keep growing, at a much slower rate (one 32 byte string for each chunk). This chunk hash list is needed to avoid unnecessary lookups into the storage.

I think one thing you can do is to increase the average chunk size from the default 4M to, say, 16M. This will reduce the size of the chunk hash list, although it will increase the buffer size (which stays constantly after creation).

To change the average chunk size, you’ll need to run the CLI version with the following command to initialize a new storage:

duplicacy init -c 16M repository_id storage_url

2thdoc15 · 27 December 2019 02:06

Thank you, but what significance does a larger buffer size mean? You mena it will use more RAM at rest?

Also, I am using unraid and the unraid docker. I’m not sure how to do the command line with the docker. Does this mean I wont be able to use the Web UI?

gchen · 27 December 2019 02:17

It will use more RAM at the beginning but the growth rate will be reduced due to fewer chunks.

You can initialize the storage on a different computer.

2thdoc15 · 27 December 2019 02:52

Got it, thank you will try on my mac.
Quck question, since it’s a minio storage target can I just copy paste the address from the web ui like: minios://default@…
I will of course make sure to remove that storage ID on dupliacy and minio since its already been created…
Also will it then allow me to select an encryption password after typing that command into the CLI?

2thdoc15 · 27 December 2019 03:05

okay I am getting very confused.
I was able to run it from my mac. Typed the command you said and created a storageID entered the minio S3 credentials, than got a message that the folder I was inside on the mac will get backed up, okay…

So now I go back to web gui but how do I important that repository_id to select the correct folder to back up to it? And it never asked me to enter an encryption password either

2thdoc15 · 27 December 2019 04:25

can someone please help me with this? I really want to use Duplicacy for my backups and this is the piece that needs to be fixed. I originally set up each directory to have its own storageid in the web ui. Not sure if this helps, but now that i followed your instructions for the CLI, how do I set an encryption password on it, and then add that storageid to the web ui?

gchen · 27 December 2019 15:45

To enable encryption you’ll need to pass the -e option to the init command:

duplicacy init -e -c 16M repository_id storage_url

You can use any repository id here – it won’t be used if you want to initialize the storage only and not run any backup from where you run the init command…

2thdoc15 · 27 December 2019 21:33

Thank you,

But you didn’t answer how I then import that into the web ui???

gchen · 28 December 2019 01:04

Just add that storage in the web ui. You don’t need to import the repository id to the web ui – create backups in the web ui and use any backup id you prefer.

2thdoc15 · 28 December 2019 01:13

so even though I am creating a repository ID in the CLI command, it will to the entire storage_url even if I use a different repository ID on the web ui after?

2thdoc15 · 28 December 2019 01:48

Okay so I think something worked because now I just added the storage again (after running that CLI command on my mac). I created new ID.

So RAM usage started up much higher than before, but it is still growing at the same rate!
I am backing up 110G folder with about 88 files in it. After 7 minutes at 10MB/s RAM used is over 1 GIG and still growing. I checked the log file and I notice that every second I am seeing this constantly being written in the log. Also 192.168.1.4 is not even where duplicacy is running on the unraid server. That 192.168.1.4 is the local IP address of my mac, that what I am using to access the unraid server and what I used to enter the CLI command you posted above. But I’m not sure why it keeps referencing it. Could this be the issue?:

2019/12/27 20:44:05 192.168.1.4:62675 POST /get_backup_status
2019/12/27 20:44:06 192.168.1.4:62675 POST /get_backup_status
2019/12/27 20:44:07 192.168.1.4:62675 POST /get_backup_status
2019/12/27 20:44:08 192.168.1.4:62675 POST /get_backup_status
2019/12/27 20:44:09 192.168.1.4:62675 POST /get_backup_status
2019/12/27 20:44:10 192.168.1.4:62675 POST /get_backup_status
2019/12/27 20:44:11 192.168.1.4:62675 POST /get_backup_status
2019/12/27 20:44:12 192.168.1.4:62675 POST /get_backup_status
2019/12/27 20:44:13 192.168.1.4:62675 POST /get_backup_status
2019/12/27 20:44:14 192.168.1.4:62675 POST /get_backup_status
2019/12/27 20:44:15 192.168.1.4:62675 POST /get_backup_status
2019/12/27 20:44:16 192.168.1.4:62675 POST /get_backup_status
2019/12/27 20:44:17 192.168.1.4:62675 POST /get_backup_status
2019/12/27 20:44:18 192.168.1.4:62675 POST /get_backup_status
2019/12/27 20:44:19 192.168.1.4:62675 POST /get_backup_status
2019/12/27 20:44:20 192.168.1.4:62675 POST /get_backup_status
2019/12/27 20:44:21 192.168.1.4:62675 POST /get_backup_status
2019/12/27 20:44:22 192.168.1.4:62675 POST /get_backup_status

I don’t know if this is what’s causing the RAM to grow? is this normal?

gchen · 28 December 2019 14:31

get_backup_status is the api call made by the browser to update the backup progress, and it needs to be called once every second.

You didn’t say which process’s RAM usage keeps growing, the CLI or the web gui?

2thdoc15 · 28 December 2019 16:07

I only use the web ui in docker form. I used CLI on my laptop just for the command you said to change the chunk size. I am only using web ui for all the backups

gchen · 28 December 2019 18:36

The web ui relies on the CLI to do the backup (and all other jobs), so you’ll find both the web ui and the CLI in the list of processes/programs running in the background.

2thdoc15 · 28 December 2019 20:15

Okay, I am using docker in unraid, and he docker usage is just going up. It’ at 3GB of RAM now after about 24 hours on that share that has 50,000 files and about 35TB. It seems to have slowed down and is not increasing like it did before… But it is still going up… Am I correct in understanding that your CLI command that I ran on a different computer works for the entire storage, correct? And then I could use different backup IDs on the linux machine to that storage and it will keep it at 16M right? Should it still be using this much RAM though? I am concerned that it will use up all my RAM by the time the backup is finished. I gave the docker 20GB of RAM…

ALSO
I am CANNOT delete a backup by clicking on the red X under the back up tab! I am getting the following message:
“Failed to remove the include/exclude pattern file: remove /home/duplicacy/.duplicacy-web/filters/localhost/2: no such file or directory”

2thdoc15 · 29 December 2019 02:24

So great news! It looks like the RAM had stopped increasing at abut 4.5 gigs while the backup is proceeding so I guess it worked.
Just to make sure, that command line I issued, applied to the entire storage correct? Even though I then added backups to it using a web ui on a different machine?

And also, how can I go about deleting a backup ID set? I still get the error “Failed to remove the include/exclude pattern file: remove /home/duplicacy/.duplicacy-web/filters/localhost/2: no such file or directory”

Thank you for helping me with this! Looking forward to using Duplicity for a long time!