Memory usage optimization

gchen · 25 October 2021 19:10

All changes to reduce memory consumption are now available in this PR: Rewrite the backup procedure to reduce memory usage by gilbertchen · Pull Request #625 · gilbertchen/duplicacy · GitHub

You can compile the new version from the source, or ask here for a pre-built binary. At this time I would not suggest it for production use.

A slightly different snapshot format is used, which means:

The new CLI can read existing backups, so it can replace the old CLI to work with existing storages without any backward compatibility issue.
The old CLI can’t read backups created by the new CLI.

A few notes on what to expect about the memory usage reduction:

When you run the new CLI for the first time and the previous backup was created by the old CLI, you won’t see any reduction since the new CLI needs to load into the memory the file list from the previous backup in the old format.
After the first backup by the new CLI is done, you’ll see the memory usage cut by half in subsequent backups. This is because the file list from previous backup (which uses the new format) doesn’t need to kept in memory, while the file list in the current backup still remains in the memory.
You can now set the -max-in-memory-entries option to further limit the amount of memory used by the current backup. After this limit is reached, the file list will be removed from the memory and serialized into a disk file instead, and you’ll see a big drop in memory usage.
You won’t see a constant usage even with -max-in-memory-entries 0 if you have a lot of files to uploaded, because the list of modified files are still saved in memory, but this list should take much less space.

This is the struct used by the list of modified files:

github.com

gilbertchen/duplicacy/blob/d9f6545d63a53cd8e6f6c44713bff8bb2dbbaf49/src/duplicacy_entrylist.go#L21-L26

    
      
          // This struct stores information about a file entry that has been modified
          type ModifiedEntry struct {
          	Path string
          	Size int64
          	Hash string
          }

This is the struct used by the list of all files:

github.com

gilbertchen/duplicacy/blob/d9f6545d63a53cd8e6f6c44713bff8bb2dbbaf49/src/duplicacy_entry.go#L36-L54

    
      
          // Entry encapsulates information about a file or directory.
          type Entry struct {
          	Path string
          	Size int64
          	Time int64
          	Mode uint32
          	Link string
          	Hash string
          
          
	UID int
          	GID int
          
          
	StartChunk  int
          	StartOffset int
          	EndChunk    int
          	EndOffset   int
          
          
	Attributes *map[string][]byte
          }

As you can see, ModifiedEntry is much smaller than Entry. Not only that, ModifiedEntry doesn’t include the file attributes which can be in theory unbound in size.

system · 14 November 2021 19:10

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.