Listing revision contents, need chunk lists

Ok… so I’m trying to follow, but for some reason no matter what I do, even when mentioning to storage by name, including the backup ID, it still keeps telling me the repository hasn’t been initialized…

~ # /config/bin/duplicacy_linux_x64_2.7.2 cat -storage REDACTED -id REDACTED -r 5
Repository has not been initialized

And this is being done from inside the container. I’ve tried from the hoem directory, from the config directory, even from the source directory for the backups… always the same thing.

What am I missing?

Look at any backup log – in the first few lines you’ll see a path. cd there and then run command.

Still no good… here’s my log:

/logs # cat backup-20220215-040001.log
Running backup command from /cache/localhost/3 to back up /source/REDECTED/data
Options: [-log backup -storage REDECTED -threads 8 -stats]
2022-02-15 04:00:01.197 INFO REPOSITORY_SET Repository set to /source/REDECTED/data
2022-02-15 04:00:01.200 INFO STORAGE_SET Storage set to s3://REDECTED
2022-02-15 04:00:14.872 INFO BACKUP_START Last backup at revision 4 found
2022-02-15 04:00:14.872 INFO BACKUP_INDEXING Indexing /source/REDECTED/data
2022-02-15 04:00:14.872 INFO SNAPSHOT_FILTER Parsing filter file /cache/localhost/3/.duplicacy/filters
2022-02-15 04:00:14.872 INFO SNAPSHOT_FILTER Loaded 2 include/exclude pattern(s)
2022-02-15 04:33:36.121 INFO BACKUP_THREADS Use 8 uploading threads

From there it lists the files being uploaded, then confirms revision 5 has been created.

However, when I try to run the cat command again… still the same thing:

/source/REDECTED/data # /config/bin/duplicacy_linux_x64_2.7.2 cat -storage REDECTED -id REDECTED -r 5
Repository has not been initialized

I even tried it with almost no options as mentioned before… it sill comes up with nothing.

cd <wherever /cache is mounted at>/localhost/3

Or if running in the container — cd /cache/localhost/3

There must be .duplicacy folder. If it is not there — it’s a wrong location.

YES!!! You nailed it! I did it from the cache directory and it dumped nearly 310MB of data on me. It’s actually 7,624,312 lines of data!!! Hahahaha

Now the real question is… what can I search for in this giant file to tell me which chunks I need to thaw to get the file list working for this particular revision?!

And if I haven’t said it already… thank you SO much for the help! You’re killin it!! Expect some donations, project work, anything I can contribute, I’m in!!!

I think there hides a problem already: to produce that output duplicacy had to construct the full snapshot - that involves downloading snapshot chunks. That can be frozen.

We need another command line command that will dump references for the snapshot file – to know which chunks to thaw to get the snapshot itself, which will then be used to figure out what file chunks are needed.

But then it becomes 2-step process, which is not good in itself.

This, ultimately, would be the proper solution.

This would allow even disk pooling software (DrivePool, mergerfs) to duplicate such a meta directory for increased redundancy.

I agree, I’d love to see that happen… and help where I can!

The thinking in my brain is to follow the process the code takes to get the file listings… I’m just trying to get it to list for me the files that are in each revision. So in the web GUI when you go to restore, select your backup, your revision, etc… it goes to populate the file list, and in doing so it obviously attempts to download chunks. I see the first one it downloaded… so I thawed it, now it gets that one and I see the second one it’s trying to download… so now I’m thawing that.

So in the code, how does it know which chunks it’s going to need to get to give you that file list?

If I can step through this manually, I can develop a logical process of what would need to be done to make that part of the GUI function properly with Glacier Deep Archive… then we can move on from there… I’m even taking some online coding classes to I’ll be able to develop some of these things myself and submit a PR… but I’m a little too junior to understand what the code is doing right now.

A snapshot/revision file is just a json serialization of this Snapshot struct:

This file is compressed if the storage is not encrypted, so you can’t print it in plaintext.

You can’t find the list of metadata chunks directly in this file. Instead, there is one extra level of indirection involving FileSequence, ChunksSequence, and LengthSequence. For example, FileSequence is a list of chunks. To get the list of metadata chunks that make up the file list, you must download all chunks in this FileSequence, and then concatenate these chunks together, then deserialize them from json to a list of Entry structs.

Ok… this somewhat makes sense to me… so how do I start? Is being able to cat the revision into that 7 million line file helpful at all? How do I figure out which chunks I need to thaw, download, concatenate, etc…?

I’m really struggling here to get this done on AWS S3 Deep Archive, but deep down I know we can make it work, I just need a little more help. Would showing you parts of the “duplicacy cat” contents help you point me in the right direction… ?

Another question might help maybe… how does the duplicacy soft know which chunks to grab for the metadata is needs to populate the file list? I just need to get that list in advance so I can thaw them before trying to list the files in the revision… the code stops after the first error in downloading… if it would just pop out all the chunks I need and not stop after the first error I’d get the list that way… now I’m trying to do that manually.

The problem is, without that… let’s say it takes 6 chunks to get the backup metadata… I have to run it, get an error, thaw that chunk… run it again, get the next error, thaw the next chunk… etc… with 6 chunks it could end up taking me 3 days just be able to list the files in a backup revision.

Ok… any chance you can help me here… all of a sudden now even my new backups are failing and I have no idea where to start…

What file can I start downloading? How can on uncompress it, and how can I concatenate things together to start getting the list…

My trial license expired, I bought one, applied it… and now it won’t restart any of the backups or do checks… it seems to want to start with downloading these chunks… and it’s going to take forever to do it. Just a little more help would be great.

Is there any way to identify metadata chunks when they’re uploaded?

I’m really stuck at a standstill here? Is there more logging I can do somehow to see the differences?