Init command details

SYNOPSIS:
   duplicacy init - Initialize the storage if necessary and the current directory as the repository

USAGE:
   duplicacy init [command options] <snapshot id> <storage url>

OPTIONS:
   -encrypt, -e                    encrypt the storage with a password
   -chunk-size, -c <size>          the average size of chunks (default is 4M)
   -max-chunk-size, -max <size>    the maximum size of chunks (default is chunk-size*4)
   -min-chunk-size, -min <size>    the minimum size of chunks (default is chunk-size/4)
   -iterations <i>                 the number of iterations used in storage key derivation (default is 16384)
   -pref-dir <path>                alternate location for the .duplicacy directory (absolute or relative to current directory)
   -storage-name <name>            assign a name to the storage
   -key <public key>               the RSA public key to encrypt file chunks
   -repository <path>              initialize a new repository at the specified path rather than the current working directory

The init command first connects to the storage specified by the storage URL. If the storage has already been initialized, it will download the storage configuration (stored in the file named config) and ignore the options provided in the command line. Otherwise, it will create the configuration file from the options and upload the file.

Duplicacy will create the destination path on the storage if it does not already exist.

The initialized storage will then become the default storage for other commands if the -storage option is not specified for those commands. This default storage actually has a name, default.

After that, it will prepare the current working directory as the repository to be backed up. Under the hood, it will create a directory named .duplicacy in the repository and put a file named preferences that stores the snapshot id and encryption and storage options.

The snapshot id is an id used to distinguish different repositories connected to the same storage. Each repository must have a unique snapshot id. A snapshot id must contain only alphanumeric characters as well as - and _. :bulb: It is important that IDs are unique across machines.

The -e option controls whether or not encryption will be enabled for the storage. If encryption is enabled, you will be prompted to enter a storage password. The storage password is used to encrypt the config file, and snapshot files are encrypted by a key derived from the snapshot ID and the revision number. If you have already created an encrypted storage to which you are now connecting, you will have to add the -e flag, so that you are asked to enter the encryption password.

The three chunk size parameters are passed to the variable-size chunking algorithm. Their values are important to the overall performance, especially for cloud storages. If the chunk size is too small, a lot of overhead will be in sending requests and receiving responses. If the chunk size is too large, the effect of de-duplication will be less obvious as more data will need to be transferred with each chunk.

The -iterations option specifies how many iterations are used to generate the key that encrypts the config file from the storage password.

The -pref-dir option controls the location of the preferences directory. If not specified, a directory named .duplicacy is created in the repository. If specified, it must point to a non-existing directory. The directory is created and a .duplicacy file is created in the repository. The .duplicacy file contains the absolute path name to the preferences directory.

Once a storage has been initialized with certain -chunk-size parameters, these parameters cannot be modified any more.

The -repository option specifies how the repository root directory is defined in the preferences file. This may be specified as either an absolute or relative path. Relative paths are relative to the current working directory of Duplicacy at the time it is executed (when the preferences file is being parsed). This option allows for the possibility of the repository configuration files and the repository itself being maintained in separate file system locations. When not specified, an empty repository path is written to the preferences file, causing Duplicacy to treat its current working directory as the repository root.

The -key option enables RSA encryption.

I’m using the docker GUI on unraid and I can’t find any info on initializing the storage for large file types. I have a directory with large VM/backup files from ~4GB to 40+GB and my understanding is init -c 1M -min 1M -max 1M is best suited for this, but I can’t see how to do this?

At this time you can’t initialize a storage with fixed-size chunking in the GUI. You’ll need to run the CLI to initialize the storage first and then add the storage in the GUI.

1 Like

A post was split to a new topic: Init should ask the user to enter password instead of erroring

A post was split to a new topic: Pass in a B2 id key at initialization time?

@TheBestPessimist - is "-storage-name " just a repository side construct just so that one can use a “name” instead of a storage-URL in subsequent commands? If so, I presume the actual storage is completely unaware of it - right?

So the encryption password is only applicable to the entire storage and there is no way to encrypt using a different password at a per-repository level - is that right? - just trying to understand - I like it this way as one don’t have to remember so many passwords.

Does the storage password play any role w.r.t encryption of snapshot files? The following line seems to indicate not

"snapshot files are encrypted by a key derived from the snapshot ID and the revision number"

But does that mean if by chance if someone forgets or doesn’t have the storage password, is there anyway to resurrect the content just by using snapshot ID and the revision number, which I hope not.

This statement is misleading – it doesn’t mention that a random key stored in the config file (and thus protected by the storage password) is also used in the derivation of the snapshot encryption key. So no, you can’t resurrect eh content just by using the snapshot ID and the revision number.

1 Like

Cloud you add information about erasure-coding parameter? It took me a while to find relevant information: New Feature: Erasure Coding

(The erasure coding page can be found under the Advanced usage section of the Duplicacy User Guide that’s pinned to the top of every post.)

Because the basic options displayed by duplicacy init -help don’t include the -erasure-coding parameter, details about it are better left to the existing dedicated post. It’s also likely that only a small percentage of users who are aware of the feature make use of it since it’s resource intensive, so requiring users to dig deeper into the user guide is beneficial – those who put in the effort are also likely the ones who will have an easier time understanding it compared to more casual users.

Plus your post now adds another pointer to the feature so it’s that much harder to miss. :wink:

Is my assumption correct, that the files .duplicacy/[filters|keyrings|known_hosts|preferences] can be copied from an already initialized repository to a new one and only the “id” in the preferences file has to be changed to a unique name instead of using the “duplicacy init” command?

Or asked differently: is it true, that the initialization of a new repository to an already initialized storage does not alter anything on the storage at all?

My tests suggest this. If so, that would make the initialization of multiple repositories (disks) on a server really easy.

Correct.

It’s already easy enough — run init command and set keys :). Or init repository once and backup all disks there. Is there any reason you need separate repository per disk?

Thanks for your confirmation.

Agreed. Actually, our use case is to backup a storage cluster, where we access the virtual disks via SMB from an admin node running duplicacy.

You could then mount all of them into a single parent folder (or symlink mountpoints into there) and backup that folder with a single duplicacy invocation. Unless you need separate retention policies per mounted share.

Is the snapshot ID the same as what is referred to as backup ID in the Web UI?

Yes

1 Like