Couple questions while using Duplicacy

ideamaneric · 28 October 2018 15:45

Hi all! Just started using Duplicacy and I’m really enjoying how snappy it is. Just a few quick questions, though!

Is there anything else I need to keep safe other than the password? In the event that my house burns down, and I lose all my devices/servers, could I start a restore to a brand new server with the password only? I’m backing up to Backblaze B2, by the way. Assume I know the account details, and the password I used while encrypting the backups. Is there some database I don’t know about or should I not worry?
How do I prune so that I keep a specific number of backups? Most prune configurations I saw while using Duplicacy were more centered around “if it’s older than x days then delete and keep one every y day.” I want to keep twenty backups (incremental) so that if I roll over to the 21st backup, the 1st backup is deleted.
How do I restore a specific file? Let’s say I accidentally deleted /path/to/repo/foo/bar. Would I run duplicacy restore -r <revision_number> foo/bar in /path/to/repo? Would this work or do I have to download the entire backup? Does the restore command accept wildcards?
How often do I have to run checks? Does Duplicacy lose chunks often? In the unfortunate event that one of the chunks go missing, how screwed is my data?

I’ve looked through the documentation, and am still confused about the four on top. Any advice would be appreciated!

TheBestPessimist · 28 October 2018 16:54

The official documentation is linked in Duplicacy User Guide.
Since you have a server, i will assume you have at least a little computer familiarity so might i suggest you test the commands yourself?

That’s easy to test: on your local pc create a VM and try to do a restore. In this way you will see exactly what is needed

Right now i don’t think that’s possible ootb. You could create a script which parses the output of list (for example) and check how many revisions you have.

Generally duplicacy shouldn’t lose chunks. There may be problems in case you run prune from multiple machines in exclusive and/or exhaustive mode, but otherwise nothing bad should happen.

You can run check after each backup if you wish to (to be ultra-sure nothing went wrong). Or you could run once every 10 backups (i just run once in a while (rarer than 10 backups)). I am not sure there’s a best practices regarding this. My best advice is: configure automated emails to be sent with the logs of the check operation (i assume headless server) so that if anything bad happens at any point you can quickly fix the problem.

ideamaneric · 28 October 2018 17:05

Currently, my computers are all underpowered, and I really don’t have the time to set up a VM and test. Also, downloads from B2 count against my quota so I asked the question in hopes that there was somebody else that might’ve figured out what was required. I mean, relevant username and all that, but come on.

For all the other points, thank you for providing useful information. Interesting that Duplicacy doesn’t have an option to include only a specific number of revisions. Perhaps it could be implemented through a feature request?

TheBestPessimist · 28 October 2018 17:11

Yep, please add a request in #feature.

I use gdrive and all I really need to remember (duplicacy related) is

the duplicacy folder on gdrive (aka the storage location)
the password of the storage (if encrypted)

I should also remember gdrive account credentials .

There’s no need to remember the repository-id as that is found in the storage, under the folder snapshots.

One other thing which duplicacy doesn’t save now is the contents of the filters file, so i guess that’s something to backup somewhere as well.

IanW · 28 October 2018 17:12

moderator edit: this is incorrect, see below in reply 7 and reply 8

Not sure I can answer your question, but equally I wonder whether it might be a case of “careful what you wish for”. I suspect that if you asked Duplicacy to keep only the last N revisions, it might make it impossible to retrieve the contents of a file that had not been modified during the time spanned by those revisions. I’m sure someone more knowledgeable will be able to confirm or deny that. Maybe that’s what you had in mind? A moot point if it’s not implemented, I suppose.

ideamaneric · 28 October 2018 17:13

Thank you! So my initial thoughts were kinda right. Marked the thread as solved.

ideamaneric · 28 October 2018 17:16

Hrm, I thought Duplicacy worked this way:

Make a file, foo/bar. It gets uploaded as chunk A, and snapshot 1 references chunk A. I don’t change the file foo/bar, so chunk A never changes, but other chunks get uploaded and referenced with further snapshot IDs. On snapshot 21, it would still reference chunk A, so even though snapshot 1 is deleted, only the reference is deleted, and the reference for chunk A on snapshot 21 still stands so chunk A is not purged when the prune command is run. I thought that was how Duplicacy worked with the revision system. Correct me if I’m wrong.

TheBestPessimist · 28 October 2018 17:16

Not sure exactly what you mean, but that sounds incorrect: each duplicacy backup saves all the files which exist in the repository now. That means that a file which has never ever been modified (from the first backup onwards) will still be available for restore, even on the last backup.

The only way for a file to disappear is if you delete it from the repository and then do a backup. (this backup – as expected – doesn’t contain the deleted file).

Later edit: @ideamaneric’s answer just above mine is correct and explains how duplicacy’s chunks work!

IanW · 28 October 2018 17:18

@TheBestPessimist: thanks for the correction, and @ideamaneric: sorry for the confusion. All clear now

towerbr · 28 October 2018 21:12

There is an indirect and not ideal way to maintain N backups: synchronize prune with the frequency of backups. For example, if you run your backup daily, you just need to set prune to delete backups older than 20 days. If you run weekly, configure prune to delete backups older than (20x7) days, and so on.

It’s not a completely correct way because if for some reason you don’t perform the backups at the expected frequency (your machine has been shut down for two days, for example) then you will be kept - temporarily - less backups than desired.