I'd really like to see optimizations to the "Listing revisions" and "Listing files in revision" steps of the restore process, because that always takes hours

TyTro · 23 January 2024 22:59

It happens from time to time that I want to restore some small file, like a small log file that was already deleted by some software that only keeps log files of the past week around, and in theory Duplicacy works great for that - but in practice, it’s always really, really annoying how slow the restore process is.

I first need to select the Backup ID, then wait 20 minutes while it’s doing “Listing revisions”. Then after 20 minutes I need to remember that I actually triggered that, select the revision I need, and then wait for 1-3 hours while it’s doing “Listing files in revision”. And then after that, I can select my 1 MB log file and restore it, which goes quickly then.

I guess the reason it’s so slow is that it needs to download a lot of metadata from the storage backend, but why can it not just cache that metadata locally somehow? I don’t mind giving it a few hundred GB more local disk space or whatever, if that would allow me to do quick restores. I can see in task manager that Duplicacy is using between 1 and 10 Mbit’s of network bandwidth while it spends hours on “Listing files in revision”, and ~1 MB/s of disk usage and 0.1% CPU usage.

gchen · 24 January 2024 01:55

I’m not sure why it would take this long to list the revisions. It is just one API call. Which backend is this and how many revisions are there?

TyTro · 24 January 2024 02:10

Dropbox, but I have also used Duplicacy already with Google Drive as the backend for a long time and that was exactly same slow, so I don’t think it’s a Dropbox issue.

I don’t know how many revisions exactly there are, but in the “Storage” → “Revisions” graph in the GUI I see that every ID has roughly 500 revisions each, and I have a total of 8 IDs. I am running prune with -d | -keep 2:60 -keep 1:7 -a -threads 10 once a week, so there never should be too many revisions.

saspus · 24 January 2024 02:16

It’s the same issue. *Drive type backends are not optimized for handling millions of files in the bucket. They are designed to hold users documents and collaboration.

For better performance you can use S3 or B2 or any other protocol that was designed for bulk storage.

Past discussion: (Newbie) Backup entire machine? - #6 by saspus