A strategy for monitoring backups using sendmail

austin.france · 5 July 2019 10:23

Would highly recommend maintaining some kind of backup status overview such as described above.

I manage and monitor backups for a global network (some 800 individual backups spread across around 100 machines across the globe), and thought I would share a strategy I use for monitoring backups.

At the centre of the strategy is a backupStatus table in a database. Each time a backup completes (successfully or otherwise) it either directly updates this table (via a command line utility typically) if on the same internal network as the database, or by sending a status email to a special email address that monitors for backup status emails and updates the database.

This results in a record added to the backup status table for each backup (or not, if there was an issue), with the start/finish times of the backup, the backup status, file count and size in mb, the live host name (the machine being backed up) the backup set name (some machines have multiple backups) and the backup machine host name.

This gives us a historical record of each backup. It also allows us to generate graphs that show disk usage and file count over time, as well as time taken to run the backup.

It also allows us to produce a current status page, which shows the last known state for each backup and shows the age of the last backup highlighting ones that are older than a specified period.

A report is emailed each morning to backup administrators of just those backups who’s age has become a concern or that are not showing a success status. We don’t report on successful backups (there is just too many), only ones of concern.

We also developed a nagios plugin to query backup status and have some of our more critical backups monitored by nagios along with our other critical systems.

The reason for recording file count and size is so that we can see if there is a problem with the backup such as something having moved and the backup not updated to pick up where it was moved to or something mistakenly removed which we would then see a sudden drop in file count or size, or if a backup is growing excessively and may need review.

We are in the process of migrating some of our backups, which are based on rdiff-backup, to duplicacy and we will continue to use the same strategy. I use the list option to capture information about last snapshot size and file count, record start and end time, and email the backup result to our backup monitoring email address.

We also mirror our backups and these are recorded too, so we know mirroring is working.

towerbr · 12 July 2019 03:32

@austin.france, which “toolset” are you using to capture these emails and update the database?

austin.france · 12 July 2019 07:38

Sendmail. In /etc/mail/alias.user we have:

alias.user:mailstatususer: "|/etc/mail/logstatus.sh"

mailstatususer is the name of the user at yourdomain.com that will be sent the email with the status and can be anything you want.
logstatus.sh parses the incoming email, and builds an insert statement it passes to mysql command line tool.

The parse stage looks for lines matching NAME=value which are the individual bits of information we want to store in the DB, such as status, size, name of backup etc.

austin.france · 29 October 2019 11:17

Here is an example of what you can do with the data logged by recording backup status. I wrote this portal to our backupStatus table recently (an upgrade of a previous cruder version). It allows us to quickly get an overview of the health of our many 100s of backups we run on a daily basis. We also can view reports on overdue backups, latest backup status etc, as well as being emailed daily with details of overdue backups (which is important, because a backup that isn’t running isn’t logging a failed status).