Using duplicacy for full windows (and possibly macOS) bootable imaging

stevenlu443 · 19 December 2024 23:27

I understand that duplicacy functions as a file level backup tool. I am very interested in leveraging it for its high backup efficiency properties to be able to keep regular ol’ files backed up well for anything that is not resident on ZFS pools which already provide robust and efficient replication via zfs recv/send.

I will transition my Linux OS disks to ZFS so that handles those.

This leaves Windows OS drives to backup, primarily.

I would generally be satisfied with having a file level backup. But I wanted to explore going one step farther, which is to have a safety net for the windows and macOS OS disks.

It’s possible to make full bootable backups of operating systems. This can be done with imaging tools, e.g. Veeam on Windows and CCC and others for macOS. Many choices exist. We consume some resources (mainly time to process) on the source computers to fully image them regularly, but I want to see if it can be made practical.

My question is about evaluating duplicacy’s performance for this somewhat extreme case of Terabyte scale files and deduplicating content chunks within files. A few tens or hundreds of gigs may change between each version update of the disk images.

The use case is to periodically do full disk images of macos and windows OS disks. The question here is around the efficiency in syncing huge image files regularly.

One huge full disk image for each computer is generated and stored locally or say in zfs samba share on my NAS.
Let’s just say that I keep one copy of each, and let’s say once a week, the full disk image is getting replaced there. So, much of the content changes, but it should be a fraction of the overall image file’s content.
To implement step 2 of 1-2-3 backup, I could use zfs replication send/recv to keep a second zfs pool synchronized in backblaze let’s say. OR i could implement this with Duplicacy into backblaze.

The question here is, if 1GB updates on a given computer, while the computer has 2TB of storage and therefore a, say, 1.5TB full disk image it’s producing every day, I need Duplicacy to be intelligent enough to only transfer ~1GB into the backup target. The question is simple, will it do this or will it re-transfer 1.5TB? Obviously we must assume the disk image is not compressed or encrypted so that blocks can match up. As mentioned above my fibered up LAN may handle a few terabytes of images getting recorded each day or week, but this would not be practical for egress to offsite even on gigabit internet.

Assuming the above is possible and can work, I also want to know what sort of control we might have over it. with ZFS we would be able to simply replace the images and then use snapshots to preserve past state at desired intervals. We may also (questionable approach however) use deduplication and keep multiple copies of older images around.

I imagine with duplicacy that snapshots going back can be explicitly managed. This will be nice. The question is could we delete intermediate snapshots? So if we have one snapshot each week but i want to keep one per month for stuff older than 3 months. i would need to delete the 2nd, 3rd, and 4th weekly backups from each of the older months’.

saspus · 20 December 2024 03:49

It’s much easier to reinstall windows and restore data — but if not, for full system backup there are specialized tools. Note, you would need to not only make a bootable image but inject drivers to be able to even see most hardware. Veaam does it well.

On macOS it’s plain impossible — and unnecessary. You can always restore macOS from recovery partitions or the internet and then restore data. Better approach however is Time Machine and Migration assistant.

That said,

You can capture images and then backup them with duplicacy (configured with fixed chunk size, similar to and for the same reasons as Vertical Backup, to improve deduplication). The same applies to Time Machine sparse bundle. No zfs involvement needed.

But I would suggest to reconsider. Machines + their OSes are disposable. User data isn’t. So backing up user data makes sense, but backing up system — isn’t.

If you have fleet of machines — then your restore image shall have all the configuration so that restoring user data only would be sufficient.

Anyway, my 2 cents.

stevenlu443 · 20 December 2024 06:40

I used Carbon Copy Cloner (the demo period) to make a bootable clone of an intel macbook to an APFS portable hard drive partition and was able to confirm it bootable from a different intel macbook. So it is possible.

But at the end of the day I am totally with you… upon further reflection there is a very narrow range of scenarios where having a bootable image is worthwhile and there will be a lot to deal with the regular huge network consumption involved in the first stage of saving and replicating such images. As you say the core OS parts are really ephemeral so only user data needs backing up. I am actually gearing up to hopefully do experiments with LXC and docker (sometimes both together) in linux land where i hope to get the software I use in a more portable and easily restored state, being consistent in this regard for macos and windows only makes sense.

Also it may not have been obvious but I was always talking about doing this in addition to already using Time Machine on MacOS and using Duplicacy the normal way (backing up user dir and maybe some application dirs) on windows. I would also use Duplicacy for Linuxes not already running root on ZFS.

Conclusion: Drop the notion of trying to image and preserve full OS disks.

saspus · 20 December 2024 16:50

From the horses mouth:

By default, CCC does not back up the read-only “System” component of the startup disk; that part of macOS cannot be restored, it can only be reinstalled by the macOS Installer. When you configure a backup of your startup disk, CCC will back up the contents of the Data volume. That’s all of your data, all of your applications, and all of your system settings – everything about your Mac that is customized.

https://support.bombich.com/hc/en-us/articles/20686428184727-Why-doesn-t-my-backup-show-up-as-a-startup-device

[added emphasis]

You can try asr to make a clone and restore it, but then you would need Apple to seal the volume. And if current OS contains critical security fixes — i would not expect Apple to sign old version; that would be a way to backdoor the system.

So yeah, it’s impossible for all practical purposes.

stevenlu443 · 20 December 2024 17:49

Indeed. See https://support.bombich.com/hc/en-us/articles/20686422131479-Creating-legacy-bootable-copies-of-macOS for the method I used which works to obtain the bootable partition, though it may only work on Intel Macs. I made use of it this time for selling my physical Intel Mac. Kinda what got me down this road.

saspus · 20 December 2024 17:54

It’s not the same though as restoring the system: it’s attempting to boot from external image. And for that you have to enable booting from external media in Recovery partition. Which mean, you have to have that partition. At which point you can just restore macOS…

I’m just curious – why go this route instead of migration assistant?

stevenlu443 · 20 December 2024 18:04

But while I have you here I’d like to ask another question. Duplicacy would sort of overlap with Time Machine’s use case in making incremental backups. But I expect duplicacy may be significantly faster or efficient on system resources. It also would have a different interface (choose the locations on disk to back up, instead of choosing a list of locations to exclude) which may be better.

Have you any general comparison notes here?

Maybe I just stick to TM for the peace of mind of having various integrations like migration assistant.

saspus · 20 December 2024 18:32

Time Machine is pretty much a snapshot replication into an encrypted sparse disk image. It takes forever to do backup on purpose – there is nowhere to hurry, and using only small amount of idle resources makes it invisible to the user.

If you have to speed up Time Machine – e.g. you are embarking on a journey into wilderness and need backup to finish asap you can do this:

sudo sysctl debug.lowpri_throttle_enabled=0

Duplicacy, on the other hand, runs full blast, as if there is no tomorrow. (I used to run it under CPULimiter, see any scripts here: GitHub - arrogantrabbit/duplicacy_cli_macos: Install script to configure duplicacy on macOS for all users without disabling SIP).

Therefore comparing time machine performance and duplicacy would be quite difficult. Both end up doing similar things: compression and copy, but with different approaches: snapshotting vs chunking.

Duplicacy supports excluding all the files that have time machine exclusion xargs, (with a handful of known exceptions - like Library/Caches, that Time Machine excludes explicitly, without relying on a flag) so you can use the same exclusion mechanism (tmutil addexclusion) to manage both.

Yes. That’s what I ended up doing myself. I backup everything to my nas, and replicate (snapshots only) to remote nas. TrueNAS have samba extension that takes a zfs snapshot of Time Machine share upon successful completion of the backup (when Time Machine cleanly unmounts). This was done to be able to recover from “Time Machine detected <whatever> and needs to start backup from scratch” message. To be fair, in my years of using TrueNAS I had never seen this issue anymore (I have seen it almost weekly when using retched synology – because Synology sucks, and iXSystems know what they are doing, primarily). So I took a step further, and now backup via Time Machine over zerotier as my primary means of backup. Imagine that – Time Machine, over SMB, over ZeroTier, over across the world – and not a single hiccup in years.

The biggest benefit is “less is more” mindset – the fewer extra tools are used the better. As long as macOS exists Time Machine will also not only exist but will work.

My time machine bundle is now about 8TB in size – I have started a new bundle when Time Machine switched to APFS (and I kept my previous bundle intact, in case I want to restore anything from ancient history, why not, space is cheap)

I also do a separate backup to Amazon Glacier deep archive, in case both NAS die at the same time, but it’s more archiving/disaster recovery than backup.

And with duplicacy, I backup a very small subset of documents (literally, just ~/Documents) folder, to STORJ – just to keep an eye on duplicacy development, waiting for features I need so I can get rid of the other app I use for glacier backup; and for support for dateless files)

stevenlu443 · 20 December 2024 20:07

I have two more intel macs and one more intel imac in the house (as well as two other apple silicon macs), this is in addition to 5 or so PCs of varying vintage i have, you can clearly see why i am trying to clean house, if not to get back some of the investment but just to gain back physical desk space. When I got my M1 Max macbook I opted to set it up fresh. I don’t have a replacement machine on which i want to transfer my old intel macbook’s state. But I may want to be able to boot a mac off the disk to the state that I had it in, so this disk partition image is a handy one to have.

Thanks for the added tips and sharing your solutions. Zerotier is interesting though I am unsure if today it is supplanted by the likes of OpenZiti. So far I have had a great experience with Time Machine going straight into my originally experimental samba access to the initial ZFS pools I built on ZFS 0.8 on this old Ubuntu 20.04 install on my workstation. Once I added a few “fruit” and “multichannel” or something configs into my samba config I was also able to at one point see 2GB/s transfer speed to NVMe over fiber on my 40Gbit network solution, that certainly made my day. (that wasn’t with Time Machine though, obviously)

They designed Time Machine to work reliably and as you explain well, unobtrusively; it manages samba auth separately on its own, sidestepping the infuriating limitation Microsoft put into samba where your computer can only be authed into these network shares as a specific single user. Even though some fraction of the time the mDNS name of my samba ZFS Time Machine target is inexplicably non-resolvable, TM doesn’t seem to get bent out of shape about conditions like that likely because it records the IP internally and falls back to it. It also by default has a good balance of notification intrusiveness on alerting when backups fail when my NAS is down or when its zpool is full. I could go on and on about how Time Machine made its way into my S tier category of well designed well executed software. From what I’ve been reading, I am hopeful Duplicacy is capable of getting close, for the windows and non-rootfs-on-zfs linux incremental backup needs, something more than i could have hoped for.

It is interesting your choice of offsite backup is glacier. I was imagining backblaze to be a good solution but you are probably just deeper in the optimization, e.g. per https://www.reddit.com/r/DataHoarder/comments/10uh8l3/aws_glacier_deep_archive_is_far_superior_to/
so that is informative to me to know of as the current endgame cost optimized setup.

I also have a set of files which is small (though it exceeds 2GB now) such as my digital file cabinet which is a barely organized stream of physical mail that I scan, and I’d love to move away from using Dropbox for that, if only for the privacy aspect of it. As convenient as dropbox and similar sync systems are, sync is inherently dangerous due to the possibility of conflicts and other things.

My 2TB iCloud is also filled with photos so I need to get to working on immich at some point.