Duplicacy GUI 'list files' has impossibly slow performance and memory usage

gui

#1

I’m hoping to purchase and use Duplicacy for a number of computers, but I’m running into one issue that’s pretty problematic.

In all other regards including the CLI, duplicacy is brilliant!

If I want to restore a single file or other subset of a repo, listing files via the GUI to do so takes a prohibitive amount of time. By way of comparison, listing all the files in the revision from the CLI executes in only ~20 seconds.

Reproduction steps:

  • I’m running the Duplicacy GUI version 2.0.8 on Windows.
  • For now I’m just backing up to another machine on the network over Samba (the repo URL is of the form samba://U:\backup) - the backups were performed from the Duplicacy GUI.
  • I have three revisions so far, spaced an hour apart each (with a very small diff between each of them).
  • The storage contains ~152GiB, made from a repo of size ~195GiB.
  • The backup contains about 1.1 million files.

Here’s some output that demonstrates how this performs from the CLI (despite the ‘sh’, this is being run on Windows, I’m just using MinGW to run the time command - the duplicacy binary being run is the Windows PE executable).

sh-4.3$ time duplicacy -background -log list -id [my-snapshot-id] -r 3 -files > file_list.txt

real    0m20.892s
user    0m0.000s
sys     0m0.030s

sh-4.3$ wc -l file_list.txt
1149213 file_list.txt

sh-4.3$ du -sh file_list.txt
258M    file_list.txt

Above I’ve used exactly the command that duplicacy runs from the GUI, as noted by its ‘Running [command]’ dialog.

It looks like the GUI is very slowly receiving the piped output from the CLI tool, which is presumably alive waiting to emit all of its output.

I understand that the GUI has to parse the log output from the CLI to display its tree, but hopefully there’s a path to optimising this (whether this involves changing the CLI’s output format to something more amenable or building the tree in a different way)?

As things stand, restoring from the GUI is almost completely impractical (I’ve not yet been able to wait long enough to see the restore list populate, having given up after 15 minutes or so of waiting, whilst the ‘Running [command]’ GUI displays as an always-on-top window on my machine!).

It would be really good to be able to use the GUI for restores, so I’ll leave it at that and hopefully there’s an answer to this on the near horizon!

Here’s the context on the underlying processes:




#2

Thank you for providing such detailed information. I’ve already started to rewrite the GUI version in go so it will call all functions directly rather than relying on the CLI version, so this slow restore problem will go away naturally. However, your post made me wonder if I should fix this problem in the current version first. The code to parse the output from the CLI version is very inefficient – due to a limitation by the wxWidget API the GUI version can only read one byte at a time and it has to be done in the main thread (that is why the running dialog is always on the top).


#3

This post is a few months old, but I have a follow-up concern. My backup is ~500GB, and the GUI is impossible to use to restore, as tekacs already discribed. Running list files on the CLI takes only a few seconds, but the resulting output log file is 130MBs large.

I wonder whether parsing long log-files is really a flexible enough approach to restore, even if it is fast(er). My understanding of gchen’s comment is that a GO version would still rely on downloading the file list of a revision, and then parsing it.

It would be nice if future versions of the Duplicacy GUI would provide enhanced restore functions, such as listing all different versions of a file/folder, searching for files/folders (and displaying their version history), similar to what CrashPlan, Cloudberry and others can do. But, I think, this would imply downloading the entire file list history (of all revisions), doing some complex parsing, and showing the output. This does not sound feasible.

I would be interested in hearing your thoughts on this.


#4

A GO implementation of the GUI version will be able to retrieve the file list as the returned value by calling the list function, so no parsing will be needed.

I think those functions can be supported by maintaining a local sql database of files that have been backed up. Every time a backup is performed this local database will be updated accordingly. This database will then have the entire version history of all files locally backed up (however, for files backed to the same storage but from different computers more work needs to be done).


#5

That’s interesting. I am looking forward to the GO implementation.

Is ‘advanced’ restore with the functions we discussed something that is on your roadmap? In a scenario of same storage/many computers a hosted SQL server on the storage computer would come in handy, but that would only be possible on Windows/Linux servers I think. But I would love this functionality…


#6

That’s interesting. I am looking forward to the GO implementation.

Is ‘advanced’ restore with the functions we discussed something that is on your roadmap? In a scenario of same storage/many computers a hosted SQL server on the storage computer would come in handy, but that would only be possible on Windows/Linux servers I think. But I would love this functionality…


#7

Yes, it is on the roadmap but it will likely be done after the next major GUI version is out.


#8

Hi,

I’m a new Duplicacy GUI user. I’m also experiencing that listing all the files on a rather small backup (~200GB) is taking hours.
As the last comment is from almost a year ago, can you please provide an update on this?

Thanks :slight_smile:


#9

Here’s a preview of the web interface: Screenshots of the new web-based GUI


#10

Currently I’m targeting a month from now to release a beta of the new GUI.