List command details

SYNOPSIS:
   duplicacy list - List snapshots

USAGE:
   duplicacy list [command options]

OPTIONS:
   -all, -a                    list snapshots with any id
   -id <snapshot id>           list snapshots with the specified id rather than the default one
   -r <revision> [+]           the revision number of the snapshot
   -t <tag>                    list snapshots with the specified tag
   -files                      print the file list in each snapshot
   -chunks                     print chunks in each snapshot or all chunks if no snapshot specified
   -reset-passwords            take passwords from input rather than keychain/keyring or env
   -storage <storage name>     retrieve snapshots from the specified storage

The list command lists information about specified snapshots. By default it will list snapshots created from the current repository, but you can list all snapshots stored in the storage by specifying the -all option, or list snapshots with a different snapshot id using the -id option, and/or snapshots with a particular tag with the -t option.

The revision number is a number assigned to the snapshot when it is being created. This number will keep increasing every time a new snapshot is created from a repository. You can refer to snapshots by their revision numbers using the -r option, which either takes a single revision number -r 123 or a range -r 123-456. There can be multiple -r options.

If -files is specified, for each snapshot to be listed, this command will also print information about every file contained in the snapshot.

If -chunks is specified, the command will also print out every chunk the snapshot references.

The -reset-password option is used to reset stored passwords and to allow passwords to be entered again. Please refer to the Passwords, credentials and environment variables section for more information.

When the repository can have multiple storages (added by the add command), you can specify the storage to list by specifying the storage name.

I normally use Web UI. LIST command is not directly supported there.
How do I use it from command line? I tried this and it fails. Snapshot and storage names are correct.

C:\ProgramData.duplicacy-web\bin>duplicacy_win_x64_2.7.2.exe list -id mySnapshotName -storage myStorageName

Repository has not been initialized

A post was split to a new topic: How do you specify the repository location

I was doing some statistics on how much space each backup increases. With the list of snapshots and revisions, along with the size of each chunk, plus the list of chunks in each revision, etc.
if you run: duplicacy list -all -chunks >>\temp\listchunks.txt you will have the full list. It turns out that the chunks are not unique (per snapshot, revision) in that list.
Is it expected? If the list of chunks is built by iterating over the revision files, since a chunk can contain fragments of many files, it will appear several times. But is it not possible to iterate directly on the chunks? And why is the date returned only with minutes of precision? Why are seconds omitted?

Yes. Duplicacy shares chunks between snapshots, so -list -chunks will show many overlaps. What does it matter about the precision of chunks timestamps? What are you trying to accomplish?

If we define that a chunk “belongs” to the oldest revision that uses it. Then we can calculate the space delta of each revision: the chunks that “belong” to it. So, it is necessary to establish a chronological order between the revisions, whatever the Id is. With several pc writing in the same storage there will be collisions with hour:minute, and probably also seconds. But seconds and milliseconds exist (at least in the file). It is a deliberate decision to suppress them, it is not clear to me that anything is gained with that.

My first attempt I got very large numbers and the reason is the repetition of the chunks, you have to take the unique ones for the sums. What I need it (I already have it, with those surprises of time and repetitions) is a report more or less like this:

Id Revision TotalSize NewData
A 1 500 500
A 2 525 30
B 1 428 150
A 3 570 22
B 2 550 250
C 1 150 10

Okay? But the check -stats command shows all this anyway. -list is for listing chunks (and files mainly).

Ok. Let’s go back to the original question: What purpose is served by repeating the chunks and truncating the dates?

Do you mean repeating the same chunks within the same snapshot ID, or between IDs?

Either way, I can’t see the problem - it’s programmatically listing the chunks for files it comes across. The same chunks may well crop up again when it does de-duplication for other files deeper in the tree. That’s kinda expected, and probably desirable.

To do otherwise would require memory resources for a map, which seems overkill when it can be post-processed, and -list is just meant to dump raw data. (I’m not even sure how useful this info is in either version - without the context of which file we’re on - as that isn’t logged. If that extra context where included, removing repeated chunks would be counter-productive anyway.)

As for dates - why seconds were omitted, I don’t know (I didn’t code it). But why d’ya need such resolution in the first place? Order can (should) be determined by revision number (which is logged alongside the ID). Otherwise, it’s easy enough to change the code, make a PR, and compile your own binary.

Why doesn’t check -stats achieve what you want anyway?

I created a fork with those two changes:
List command with seconds and unique chunks

Does this only remove duplicates within each ID?

Personally, I’d have gone in the opposite direction and added verbose or debug lines for each file - so you know which chunks are associated with any file. Then remove only consecutive duplicates. Removing such information isn’t very helpful to me as a user, and -check already has the useful stats.

check -stats “imply -all and all revisions”. Ok sometime if you really need it, but no so good when your storage is too old/big with thousands of snaphots-revisions and millions of chunks.

«Does this only remove duplicates within each ID?»

Not on each Id (snapshot) but on each Id-revision pair. Returns the list of chunks in each Snaphot-revision without duplicates, which are just noise in the original.

To produce the list of all chunks in each file you would have to change the logic of the ListSnapshots function to alter the output when the -files and -chunks options are both present. Currently, the function produces the list of files if showFiles=true and then the output of chunks if showChunks=true.

It is not difficult to change that to something like (pseudo code):

If showFiles and showChunks 
 Iterate Files
   LOG_INFO (file..
   iterate chunks in file 
     LOG_INFO (chunk)
Else //current code
If showFiles etc
If showChunks etc