What is the easiest (=least time consuming) way of comparing a list of files in a local repository with a list of files in the backup storage? (I’m on Windows, so I guess I’m asking for a powershell or comand line script.)
In order to avoid an XY problem, here is the more specific challenge I’m facing: A couple of moths ago, I moved a number of directories from one directory tree to another and something seems to have gone wrong in the moving process because I recently noticed that quite a number of files seem to be missing in the new location. I’m not sure what exactly happened (and it is irrelevant here), but it looks like it might have something to do with the path length, i.e. that files whose path exceeded a certain length got lost. But that is just an aside here.
So, before I go ahead and start the missing files, I would like to get a better picture of which files are actually affected, i.e. I basically want a list of files that are available in the backup storage but missing in the local repository. The file dates and times can savely be ignored because none of the files has been modified in the past months.
Note that the local paths are no longer fully identical with the paths on the storage but the differeneces are only at the beginning of the path. Something like this:
Current local paths:
c:\users\myself\project1\abc.txt
c:\users\myself\old\def.txt
c:\users\myself\project2\admin\ghi.txt
c:\users\myself\templates\admin\ghi.txt
Paths in snapshots (corresponding to previous local file locations):
users/myself/Box Sync/projects/project1/abc.txt
users/myself/Box Sync/projects/old/def.txt
users/myself/Box Sync/projects/project2/admin/ghi.txt
users/myself/Box Sync/templates/admin/ghi.txt
users/myself/Box Sync/templates/admin/comm/xyz.txt
(The duplicate file name is intentional.)
So the starting poing would be the latter list of paths (though duplicacy’s list command doesn’t output them as neatly as that when I use duplicacy list -id ALPHA_C -files | select-string -pattern "/Box Sync/"
) and then check for each path whether that file is currently present locally, somewhere.
So in the above example file lists, the result would be
users/myself/Box Sync/templates/admin/comm/xyz.txt
Which is the file that is present in the storage but not locally.
The solution does not have to be perfect and fully automatic. I can handle a number of edge cases manually. And neither do I expect anyone to write an entire script. I’d be happy to learn about strategies for tackling this more generally.