Deduplication question: I moved a lot of backed up data from local drives to NAS

duginov · 13 March 2019 15:38

I had a Windows machine with multiple hard drives and I was backing up 10 repositories from those drives to the same OneDrive storage (using Windows GUI). Now I moved all of that data to a new NAS and it’s available from Windows machine as a share \\mynas\consolidated.

If I install Web UI on that Windows machine and create a new, single repository to back up the network share to the same storage - will it realize that most of that stuff is already backed up and only upload the difference? Should I (or should not) get rid of multiple .duplicacy folders in consolidated location?

Droolio · 13 March 2019 22:22

Yup it will de-duplicate the vast majority of it - apart from, you’ll notice, a handful of chunks at the boundaries between old repository directories, and unless you rearranged these folders significantly (even then, it’ll do a good job de-duplicating as much as possible). So not 100% but close to it.

Use the -dry-run option to see what it would do.

Note if the new repository name is different to any of the old ones, it’ll perform a fresh backup with -hash automatically and take longer, but it’ll still de-duplicate.

duginov · 13 March 2019 23:20

Thanks. This is helpful
However, I’m still not sure whether I can delete old .duplicacy folders or not
Also, what would I do I do to prune/get rid of old repositories while switching to a single new one?

Droolio · 14 March 2019 00:28

Ah, soz. Yes, you don’t need those old .duplicacy directories with the new web UI…

Apart from the filters file that you may have manually created in there, they can easily be recreated* anyway - with your storage password etc. - if you need to go back to CLI. But, you can delete them.

If you run a regular prune job with the -all flag, the snapshots for those old repositories will eventually get pruned as well, apart from the oldest one.

I’d just do that and wait until the day you end up with one snapshot for each old repository, then you’ll need to run a prune with option: -id <old_repo_id> -r <last_revision> -exclusive to get rid of them, making sure no backups are running during this.

Finally, from what I can understand, your prunes won’t fully work for the next 7 days or so after you last did backups on those old repositories. This is because Duplicacy expects successful backups to occur on all repositories found in the storage between when fossils were collected and when they’re ultimately deleted.

After 7 days, those repos get ignored and ‘unblock’ prunes for the rest of the repos. The old repos still get pruned after time. See Prune (last paragraph). Just something to expect.

Edit: *unless you wanna keep the \logs?

duginov · 15 March 2019 20:10

I didn’t find any prune related settings in new Web UI. Does it mean I have to do it manually from CLI? It’s been more than 7 days since old server has been shutdown and “old” backups stopped working on schedule.

Droolio · 15 March 2019 20:24

For the Web UI version, you’ll need to add a new schedule (green plus + button at the bottom of the Schedule screen) and/or Prune job to an existing schedule.

At the moment, you have to put either -id <repo_id> or -a (I suggest the latter - for all repos) to the Options field as otherwise it supposedly does nothing.

Since it’s been more than 7 days, that should be fine. Chunks will be collected for pruning on the first run and deleted on the second run, only after backups for your new repository is run.