A few small questions

Akita · 2 November 2023 10:33

Hey All,
So currently evaluating duplacacy and its looking like ill be going with it as its the best out of all the ones I have evaluated. I do however have a few questions I’m hoping someone can help with.

Say I have a 1TB drive, rather than backing it all up at once i do it folder by folder (slowly removing Folder filters I had set before running each backup). Is it better to get it all in the first initial backup? or is it ok to do it this way?, any disadvantages? since I shutdown/use the machine I just did it this so it was a smaller hit.
veracrypt - is there any issues with using Duplicacy with veracrypt mounted volumes (the volumes im backing up from are mounted VC volumes). Arq had issues with this that they werent willing to fix after talking to support. I’ve had no issues so far, but thought I’d ask incase someone has.
Also i only mount my volumes when i want to use them so they will stay unmounted 90% of the time. if I run up duplicacy with this unavailable i assume this will be fine?. My testing has proven not an issue so far. I will however mount them before I run a backup.

I’d like to understand this checkbox more “Copy-Compatible”, i believe Its set to be able to create a backup of 1 source it can be to two destination locations and rather than it backing up to those 2 locations it backs up to one and you can copy to the other using a “copy” job that would need to be setup under schedules. So you dont need to back it up “twice” ie just copies the extra data between locations?. (just want to make sure im correct)

Sorry for the long post, I read the lock-free de-duplication wiki but just wanted to understand these questions more.

Also love the product and community so far, I’ve read so many forum posts and I like that there is an active community. Took me a while to get my head around -keep/pruning but reading the forum sorted that for me :).

Thanks

saspus · 2 November 2023 16:43

Welcome to the community!

You can, I don’t see any disadvantages, other than first few revisions will continue partial data, but I don’t see any benefits either.
Duplicacy uses filesystem api to stat, open, and read files. This is a very basic stuff. What sorts of issues did you have with Arq? Arq does a bit more than that – it allows user impersonation to backup storage mounted by another user, and this may introduce some incompatibility with half-baked filesystem implementations.
Depending what you unmount. If the root folder that is being backup is unmounted duplicacy may fail. If you unmount subfolders – they will be backed up empty. You can always write pre-backup script to check if your directory exists and is an actual mountpoint, or that the folder under the mount point exist, and abort backup, returning non-zero exit code. That would be a cleaner approach.
Copy compatible allows you do copy data between storages. There is also bit-identical, in which case even encryption keys are identical, and you can copy repositories using third party copy tools like rclone, without needing to decrypt/re-encrypt data, as would be the case with duplicacy copy.

alind · 2 November 2023 17:54

Hi @Akita and welcome to the forum!

I do this too for large backups to slow destinations. As far as I know there should be no down-sides and I haven’t experienced anything negative.

Personally I would love to see Duplicacy implement periodic (perhaps hourly) in-progress snapshots during a large backup. I have had numerous issues where internet went down for a few minutes during a multi-day backup. At this point, Duplicacy didn’t save an in-progress incomplete backup and I had to rehash all data.

It would also avoid the need to wait 7 days to mark a repository as inactive for pruning and the risk of pruning chunks for backups that did take longer than 7 days.

Depending on which platform you’re running on (Windows/Mac/Linux) I have had different issues. On Windows, user-mounted drives are sometimes difficult to access from system-processes. This would mainly be an issue if you run Duplicacy as a service. I have had similar issues with Arq and IDrive just to name a few.

For Mac, the same can sometimes be true for FUSE-mounts and the --allow-other mount flags may be needed.

Depending on your specific setup this can either be a non-issue or a bit more of a pain.

(1) The first issue I’ve seen is that pruning of past backups will not care if the folder was available or not and may happily remove all backups where it was mounted.

(2) As @saspus mentioned, if the folder is unmounted, it may be backed up as an empty folder. A subsequent backup will then see all data in that folder as new and back it up again. Normally this isn’t an issue, other than having to read and hash all data, as it will generally deduplicate the data quite well.

For me specifically, I have an application (thanks Apple Photos) that adds and removes files “randomly” with different names and paths. This combined with the files being around 3 MB means a fresh hash of the drive would add a very large amount of new chunks.

saspus · 2 November 2023 18:18

Are you backing up the whole bundle? There is a lot of transient and derivative data, that does not seem to be having an exclusion attribute. If you instead only backup the “originals” sub folder – thats’ quite stable.

I did not know verarlcrypt uses fuse, but it makes a lot of sense now. Yes, the --allow-other is key, if mounting and backup is done by different users. But even then it might not be enough, that is if they implemented security properly. Even root on macOS cannot read user data. There is a separate data protection mechanism, that’s why Arq had to impersonate a user with a separate helper process to reach user-mounted filesystems.

Somewhat tangential – but fuze on a Mac may not be such a great idea. It requires kernel extensions, and your security is now hinges on lack of bugs in osxfuse implementation. Google, Box, etc, all of them abandoned this approach. Not only on a Mac – but on other OSes as well, such as Windows. The right approach would be to use FileProvider API or do what many other apps, like MountainDuck are doing – serve data form a local instance of NFS server. This now becomes a network mounted folder and all that infrastructure “just works”

Instead of vera crypt on macOS I would suggest switching to encrypted sparsebundles. If data is important, that’s the more reliable path forward – sparsebunles are first class citizens on macOS and will be always supported. I’m using them myself for sensitive data the that to live on other servers. The key to decrypt resides in keychain so the whole dataset benefits from the entire chain of trust and secret management on macOS – which is next to none.

alind · 2 November 2023 18:33

Hi @saspus!

Thank you for the great suggestion. I do unfortunately only backup the originals folder but as free space on my drive changes, Photos sometimes evicts and re-downloads files.

This isn’t a huge issue, unless I were to force a full hash. Just wanted to mention it.

Sorry, I do not know what veracrypt uses on Mac, I was only speaking in general terms for Fuse mounts.

Thank you for pointing that out! There is a great project called FUSE-T (https://www.fuse-t.org/ - No affiliation) that intends to implement the FUSE API using a local NFS server kext-less. Pretty interesting.

Unfortunately the pCloud Mac Client requires regular FUSE but I think that says more about pCloud…

An alternative, albeit with arguably less security is rclone and Cryptomator. I have had good experiences with both. And would allow file-level backup of the encrypted data if desired.

rclone has a 1:1 mapping between clear-text folders and files and their encrypted versions. Not as secure but very convenient if you wish to do a partial restore of a few encrypted files from a backup.

saspus · 2 November 2023 19:06

I see. And the file names change from download to download? That’s interesting. I’m wondering why.

BTW, another tangential, Arq implements now support for dateless files - with providers that use FileProvider API – if the file is not local – you have a choice to skip or materialize it; and once materialized – if metadata did not change, it can be ephemerized again and won’ need to be re-materialized for subsequent backup. I’ve filed feature request for duplicacy for this. It’s an awesome feature. I don’t think it works with Photos though, because photos manage their own cache on their own.

Which brings another point – for immutable media, duplicacy’s compression, versioning, and deduplication is rather pointless, so you could technically just rclone copy stuff to the target. E.g. to amazon glacier. And then you won’t worry if files come and go (as long as the filenames are stable, which since apparently for photos are not – it’s a non-starter). I’m still holding on to my Mac with 8TB SSD, but I really want to be able to migrate to a faster one, with smaller disk. But all those details prevent me from doing so.

Wow!! This is awesome! I’ll be watching it closely.

Cryptomator does too, and benefits form a well made UI (CyberDuck and MountainDuck – the NFS based mounter). Still, this is not a “real” disk in many respects – sparse bundle on the other hand behaves indistinguishable from a disk. Thi might matter for some use cases.

alind · 2 November 2023 19:33

Yes, I wish I had enough storage on at least one trusted machine to hold all my data. That would avoid a lot of my problems with cloud data and rclone mounts…

Yes, quite exciting! It seems both rclone and Cryptomator support it already.

Akita · 3 November 2023 01:40

Hi @saspus and @alind
Thanks for your replies and information! much appreciated. this is helpful.

in reply to some of the questions

Issues with Arq was the service starts on boot, my veracrypt aren’t mounted till after i boot and choose to mount them, apparently veracrypt doesnt dispatch volume arrival/departure events to the system services correctly. So i created a script that started the arq service than started the app after my drives were mounted which worked as it started the agent after the drive was already mounted and could than see the path/drive. i asked support if running the service only when i need to backup was supported and was met with “its not the intended use of arq and cant guarantee it’ll work in future.” and for backups, i decided i need something a bit more future proof.

would the script be only for the CLI version or could i put the script in the “command options”? in GUI, Im currently mounting z:/ and backup 2 folder underneath and P:/ backing up everything underneath with filters. i only fire off a backup when i know its attached. If i fire off a backup when they arnt attached the back fails (which is perfect as id rather it fail than backup something which isnt there

im not sure i understand this correctly. i currently dont have a prune setup, i assume this is only if i setup a prune job with the default settings?

Ah yep im on windows, currently using it as “run when i need to” using the gui and web. i was considering a service but i like that fact i can just run it when i want to run a backup. Thanks for pointing out issues with running as a service, ill keep this in mind for future reference.

understood, it seems fine for me at the moment with testing as it fails if the drives arnt mounted. if i then mount and run again, it seems to work ok.

=====

thanks again for your help.

alind · 3 November 2023 02:24

Hi again @Akita!

Sorry, this was a general comment on Duplicacy and not specific to your use of it.

Then I think everything you described wanting to do with Duplicacy should work perfectly fine!

Happy backing up