Filters for files in subdirectories

Hello.

I’m failing to set filters which should match particular files in subdirectories. The directory structure is similar to this:

|-- 55555555-aaaaaaa9
| |-- myfile.1531208949.something
| |-- some-file.1531208949.something
| |-- file-some-3e4f6b6a0443cfb3e3870114d2426141.1531225199.something
| |-- _my_file.1531208937.something
| |-- some-100322.1531611728.something
| |-- some-101178.1531697588.something
|-- aaaaaaaa-ffffffff
| |-- file.1531208949.something
| |-- my-file.1531208949.something
| |-- file-3e4f6b6a0443cfb3e3870114d2426141.file.something
| |-- _my_file.1531208937.something
| |-- file-100322.1531611728.something
| |-- file-101178.1531697588.something
| |-- file-101389.1531690354.something
| |-- file-103050.1531714900.something
| |-- file-105602.1531693303.something
| |-- file-106685.1531690334.something

I’d like to have a filter which would allow me to match a specific file inside the folder. This would allow me to backup every file in separate snapshot and revision.

That sounds rather weird. What is your usecase?

The filters should explicitly include all the folders up to the file you want to include, include the file, and then exclude everything else.

If my understanding of what you wish is correct
eg. to include only the single file located @ \a\b\c\d\abc.txt i think the filter should look something like this:

+a/
+a/b/
+a/b/c/
+a/b/c/d/
+a/b/c/d/abc.txt
-*

More details found here: Filters/Include exclude patterns

Thank you. The filter you suggested didn’t work for me.

As for the separate snapshot and revision - we need to be able to delete on demand any file we have in the backups. This is not quite possible if we backup an entire folder with files, so we’ve decided to backup file by file, and tag the snapshots with the file name. That way we can easily identify and prune it.

1 Like

What didn’t work?

Could you please run duplicacy (i assume CLI) with the -d flag, save all output to a file, and then upload the file for debugging?

duplicacy.exe -d backup > output.txt <- this writes the whole output to the file output.txt.

we need to be able to delete on demand any file we have in the backups.

Depending on the business reason behind this requirement it may not be possible to accomplish what you need at all: even if you have a separate snapshot per file and prune it completely the file or parts of it may still be present in other existing chunks, possibly more than once (based on my current understanding of how duplicacy manages chunks)

1 Like

“Filters” is always a slippery subject …

I have a job that only backs up a specific file (a Veracrypt volume) and it works fine with:

i:(?i).*file.ext*$

TheBestPessimist,

apologies, what you suggested actually works. At first I tried using a regex for the file name, but when using the full name in the way you suggest - duplicacy backs up the file in the subdirectory. Thank you!

saspus,
still, after I prune the snapshots of certain file, it will not be possible to restore it or read information from it in any way, is that correct?

1 Like

I think that should be correct.

Imagine you have total two files to backup, each 1Mb and residing in their own repository’s.

I’m simplifying here but it’s easy to imagine that they
both could go to a single chunk; so that chunk contains data from both files.

Now you prune everything from the second repository.

That chunk will not be deleted because it also contains data for the first file.

So while the second file will likely not be visible through API (because snapshot file will not reference it) its actual data would still be there and i’d imagine it would not be hard to recover it.

saspus is this a guess/suggestion or you know in fact that this is how duplicacy and pruning snapshots work?

I’m not a duplicacy developer but it is based on my understanding of how duplicacy works from reading Lock free deduplication algorithm and in part reviewing the source code. I can be wrong of course.

Yes, I also understood that this is how duplicacy works. I think the open question is: what exactly needs to be done to recover the deleted file and whether that is acceptable within the policy or regulation you are trying to comply with.

To clarify: doing this is not part of how Duplicacy is meant to work. For duplicacy, this file is gone and (I believe) there is no way to restore it using duplicacy. However, as @saspus explained, chances are that it is still there and could be extracted using other tools. Comparable, in a way, to deleting a file on your hardrive (real delete, not recycle bin): Even though it’s gone, there are recovery tools that might be able to restore it nonetheless.

1 Like

My understanding of Duplicacy is the same (although I haven’t fully read the source code)…

When files are bundled together in a chunk, that’s where subsequently excluded files won’t get purged, IF the neighbouring files don’t get touched. However, I read somewhere that using the -hash option will trigger some kind of rebundling(?). I could be wrong about that, but I guess this can easily be tested.

Anyhow, the above strategy of excluding a certain file in one repository, and including only that file in a second repo, would work just fine on a fresh storage.

Ok so the only viable solution seems to be using a dedicated repository for every file. And the repository in question will have the lifecycle on a file (and the backups of that file).

Frankly, from what was described here I think that you don’t actually need a backup solution.
You need to copy the files to some sort of vault, so that nothing will be able to modify them, and you only can either delete the file(s) or replace it with a newer version.

I think duplicacy is unfit for what you desire, as what you desire is not a backup solution.

1 Like