Remove specific files from the storage

Christoph · 14 October 2018 08:34

One limitation of the way duplicacy works is that you can’t remove (or preserve, for that matter) specific files from your backup. I’d be curious to understand what kind of operations would be necessary to make this possible? In other words: what would a pruning option need to do to remove a specified file from the storage?

Some initial (high-level and amateurish) thoughts:

Identify all snapshots that include the file in question (based on its path, so if the file has been renamed, identifying those “versions” would need to be done manually)
Identify the chunks belonging to the file (candidates for deletion)
Determine whether any other snapshots refer to any of the candidates and remove those chunks from the candidate list
Determine whether any of the candidate chunks are used by any other files in the snapshots containing our file (and remove those from the list too)
Fossilize the remaining candidate chunks.

From here the process will be the same as with the ordinary pruning process, i.e. the files will not be immediately removed but soon, in one of the coming prune actions.

Droolio · 14 October 2018 13:37

I think steps 2-5 would be unnecessary as these are done as part of the normal prune process…

The real issue is to identify the snapshot revisions that mention the candidate file and then to rewrite the revision without that file included. However, I believe that if the file is part of a chunk sequence that includes other files, that’s where things might get a bit complicated.

Could it restore that sequence of files to a temporary cache and then repack them? Perhaps.

Then afterwards, a normal prune operation with -exhaustive to remove the relevant chunks.

I’ve been thinking about a workaround hack that would cover a lot of scenarios, including this - a script - that could repack an entire storage (or rather, create anew) by restoring each revision, one-by-one, to a temporary space, setting the system time to the snapshot time, backing up to a new storage and repeat until done. A script would be a little crude but quite straightforward.

Since Duplicacy does incremental restores, this wouldn’t take as long as one might imagine. And you could use it, for example, to repack a whole backup history into new chunk sizes. Or import a set of backups into a copy-compatible storage (if it originally wasn’t made compatible). Or remove file(s) from history.

Obviously it would be super nice if Duplicacy could manipulate backup storage directly, but I can definitely see it being quite complex.