How can I find which chunks make a file and the other way around

Hello,

I had a “problem” with duplicacy today. I added several files and they would not upload to the server. After much searching, I found out that they were already on the storage, uploaded on another computer that uses the same storage for backup.
A duplicacy command that will let me make the connection between a specific file and its chunks and the other way around (What files this specific chunk is part of) would greatly help me to debug my issue. Any idea how to get this done?

Thank you.

1 Like

What is the problem are you trying to debug?

It seems to me it worked as designed. Data chunks already existed in the storage so no bandwidth wasted uploading them second time. New snapshot was created referencing existing chunks.

This is one of the features of duplicacy.

I understand that this is a feature of duplicacy and I didn’t want to bother the forum with my personal story. But if you ask…
I recently got about a hundred image files (around 1.3 GB) from my wife and I put them on my home server. I also made several changes to the directory structure (Never change two parameters at the same time!). The next day (I run a daily backup), I found that my server did not back up the expected amount. So I started researching. “check” returned no error. “diff” returned the expected results. “cat” showed the file content is as expected. “backup” (–debug of course) said there was nothing to backup. I remembered that my wife’s laptop backs up to the same storage and maybe the files were already uploaded to the server through her backup. So I checked the content of her backups but nothing there. 2 hours later and I was at lost.
Then I remembered that my desktop machine also backs up to the same storage and there I found the first copy of the files. After some questioning I found out that my wife has put the picture files on my computer (A general name for my desktop and the home server) because she thought it would be safer(?!). duplicacy backed up the files from my computer and the server backup had nothing to do.
I could have saved me those two hours if I could ask duplicacy to tell me the chunk number of one the files and then ask duplicacy again which files in the storage use this chunk. That’s all.
No need to explain to me the features of duplicacy, this is why I use this great backup solution. I was just wondering if it’s possible to do what I was asking for.
Thank again.

1 Like

I still don’t understand why did you spent 2 hours.

duplicacy list --files can show you what files are included in a specific revision. You could run it on your the server and stopped right there once you saw that the files are backed up.

Still I don’t understand what are you trying to do here. Find out which client uploaded which chunk? Just for the heck of it?

Anyway, this thread discusses how to do it. But it won’t be worth your time: Listing revision contents, need chunk lists

“duplicacy list --files” for a storage with several repositories with thousands of revision (and with many thousands of files) is a long operation which creates a big log file.
An API which will allow me to trace back chunks from files and from there to return to the files would be much faster.
But thank you for the pointer to the discussion about the subject.

I meant to use list --files in the last revision only; to confirm that the files in question were picked up. How they got picked up is immaterial.

What would you do with that information, though? You would need to go parse backup logs on every client to find out which one was responsible for uploading the specific chunk? And then there is a whole other can of worms with duplicatable files – pieces of which could have been uploaded by different clients.

The ultimate outcome would be “yep, these files were backed up”. The same outcome list --files could have told you.

You’re absolutely right, of course, that a way to determine which chunks pertain to which files and versa versa, would be a very useful debugging tool.

This is one of the features that struck me as very obviously lacking when I first started using Duplicacy and, while it’s possible to use Duplicacy normally without, when things go wrong…

Missing or corrupted chunks, inexplicable data growth, tracking duplication or determining source; it’d save a lot of time if you could find out instantly, directly from the storage, as opposed to trawling through multiple client logs on multiple systems (IF those systems are still accessible coz, y’know, systems fail and Duplicacy is there as a backup).

Plus, the metadata is obviously there. So +1 command / API.

2 Likes

Years ago I had the same question of this topic (how to relate files and chunks), and the - not so simple - answer is in this old topic:

More specifically:

2 Likes