Best way to backup virtual machines (KVM)

I’m running a couple of KVM virtual machines on my home server. Currently I’m doing an initial backup of \ using this filter file:


+home/*
+etc/*
+var/*
+usr/local/etc/*
+usr/local/
+usr/
+root/*
+var/lib/*
+var/www/*
+var/local/*
+var/

-*

Later on, I was going to install duplicacy on the virtual machines and back those up too. But now it struck me that I might already be backing the KVM guests up Is this so? (I couldn’t figure out where they are stored).

And if I am currently backing them up, is it recommended practice to back the vms up from the “outside” rather than from within them?

They should be in /var/lib/libvirt/images/ by default, so they’re probably already being backed up.

I think backing up from outside is preferable. However, the VM should be in a consistent state during the backup. I don’t know what the best way is to accomplish this with KVM and I think it would depend on what’s running in the VM, but my initial thought would be either

  1. Take snapshots of the running VMs and back those up instead of the running images (example not using duplicacy)
  2. Create a pre-backup script that shuts down all of the VMs before backing up the images, then restart them all with a post-backup script (if downtime doesn’t matter, of course)
1 Like

I’m not sure. It would mean I’m backing up an image of about 7 GB daily while only a few small files on that system might actually have changed. I’m not sure how those image files are built, but I’m worried that some small changes might alter large parts of the image files so that most of it will be backed up again, which would be a huge waste of resources.

Also, I’m backing up the entire operating system. Not a huge deal, but why use storage space to backup something that definitely doesn’t need to be backed up?

Finally, to the extent that that system contains files that have already been backed up from another system, I’m not sure whether deduplication will work for those.

So, as far as I can see, the only advantage of backing up vms from the outside is:

  1. it’s easy
  2. it makes restoring extremely easy.

And #2 could also be achieved by doing a single (or very rare) backups from the outside to easily bring the system back and then run a restore from within to bring up to date. No?

This is definitely a good idea! If backing up from the outside is the way to go.

I should have thought through the potential use cases (e.g., what’s an acceptable restore time) and setups (e.g., are VMs manually or automatically configured) more before responding. I currently backup VM images using ZFS with sanoid/syncoid, which replicates them to a backup server, but ZFS knows exactly what changed and what didn’t in between snapshots (which can’t be said for duplicacy).

I’m not sure how well duplicacy would handle changes in VM images and deduplication of OS files across them (I’d be interested in this test), but the main draw would be relative ease and speed of restoration – like you mentioned. If something fails, pretty much all you need is the VM image; no reinstalling stuff, hoping you remember or documented everything needed to get the applications working again, restoring up-to-date per-VM application configs and data, etc.

If you have the right setup in place, probably the most elegant and space-efficient VM backup method in my mind would be to

  1. automate VM provisioning and configuration using Ansible (or similar, and related tools)
  2. backup VM application data and database snapshots (stored in a location the host running the backup has access to)
  3. backup the Ansible playbooks

In the case of a disaster, you’d only need these 3 items to get back up and running.

Maybe for some, but that would no longer be “extremely easy” for me. At least for my VMs I’d need to do a bunch of up-front work for each one to identify what exactly in the image needs to be backed up, how to back it up safely (especially any databases), and schedule backups and error emails for each VM to be run separately from the one on the host. Then if I ever need to restore from backup, go through at least one extra step for each VM.

1 Like

Ah, this sounds interesting. I could create such backups locally and then back them up with duplicacy…

But what is the role of zfs here? Does it only work if the imahes are on zfs? On my server, the SSD filesystem is an ordinary ext4 (or whatever it is) but the mounted main HDD storage is zfs. Currently the images are on the SSD, so I guess your setup wouldn’t work?

zfs’ role is creating atomic filesystem-level snapshots and providing mechanisms for replicating those to another server. The snapshots taken while the VM is running should be roughly equivalent to the state the VM would be in if they suddenly lost power (which is fine for my use cases). It does require the use of zfs for storing the VM images.

Right. This wouldn’t be an option if you’re not using zfs. I only mentioned it because storage-/filesystem-level snapshots are the only way I know of to backup atomic snapshots of VM images. I’m not sure how tools like duplicacy that need to read through the files while they’re also being written to can handle this (except for environments where something like VSS is available).

If you can find a way to take consistent snapshots, I would recommend backing up the VM images directly. This is basically how Vertical Backup does it, and it works really well. Just remember to switch to fixed-size chunking when initializing the storage (duplicacy init -c 1M -min 1M -max 1M).

4 Likes

Is this just because you know the VM image snapshot will always be the same size, and so Duplicacy can better deduplicate internal chunks of that data?

That is right. There is really no need to use the most complex and thus slower variable-size chunking algorithm when you don’t perform addition or deletion to a VM image.