Is there an efficient way to backup ZFS datasets of VM disks?

I have Linux (Proxmox) host on ZFS running some Qemu/KVM VMs and am thinking if I can utilize Duplicacy for off-site backups.

Here are my thoughts and concerns so far and since I’m really unsure about everything of it, it would be great if someone could comment on it a little bit:

  • I do want to run the backups form the host and not inside the VMs since ZFS snapshots/clons are great to get quite consistent data states. Am I right by thinking that backing up a Linux guest with a running DBMS of some sort (be it MySQL, MongoDB or what ever) “from the inside” with Duplicacy would be just the same kind of gamble on the repair capabilities of the DBMS as it would be with nearly any other backup solution out there?

  • Ideally I’d like to do something like zfs send rpool/data/vm-100-disk-1@my-backup-snapshot | duplicacy-some-command-to-process-and-backup-data but I fear Duplicacy cannot do this as it is only designed to work with files. Is this right and if not how would the command look like?

  • When piping zfs send to Duplicacy should be no option (otherwise skip this point) I really would like to avoid sending a snapshot to a temporary file and backing up this file because each would be between 40 and 400 GB which would mean massive write overhead. And all snapshots/clones are also always available via /dev/zvol/rpool/data/vm-100-disk-1@my-backup-snapshot as “normal” disk / block level device. However obviously with various file systems of their VM guest’s OS which cannot easily be mounted on the host. Is there a good way to tell Duplicacy to read them block by block or a workaround to mount them as a file that would look similar to the output of zfs send rpool/data/vm-100-disk-1@my-backup-snapshot > my-temp-file-to-backup but without the write overhead?

  • If there is a way or even if I really go for the ugly temporary file, what would be the best chunk size setting in this case? I read that fixing it to 1M is recommended for other VM technologies but with ZFS I always would have the full disk content (I’m not planing on implementing a incremental ZFS snapshot strategy) and wouldn’t I strip Duplicacy of some of its de-duplication capabilities by fixing the chunk size? I. e. if I have another VM with the same OS but which might have files/stuff at slightly different positions, will I have any de-duplication effect?

  • Is there anything else to consider or why this could work or shouldn’t be done at all?

Thanks a lot for any input on this topic!

I don’t have experience with ZFS but I’ll try my best to answer your questions.

That is correct. Duplicacy can’t guarantee consistent reads on the DBMS files.

Duplicacy can’t read from stdin.

The fixed chunk size would be sensitive to byte insertions/deletions, so I think in this case it makes sense to use the default variable chunk size setting. However I don’t know what the optimal chunk size should be; you’ll need to figure it out by trials. And yes, the same file at different disk positions may invalidate the deduplication, unless you’re using a chunk size close to the disk sector size (which I would not recommend as the performance with such a small chunk size would be really bad).