RAID disk monitoring with postfix and mailgun

The redundancy of RAID buys you time between disk failure and server failure. But a default RAID setup will happily function with a failed disk, until the next disk fails and your data is lost.

Email notifications need to be configured manually so you can intervene (replace a harddrive) after a disk failure.

At GPXZ we use RAID heavily to work with large datasets. This is how we handle monitoring of RAID arrays, using Mailgun to send email alerts.

These instructions use Ubuntu 20.04: different linux distributions may have config files in different locations.

Overview.

mdadm and smartd can’t send email directly: they instead pass emails to mail relay software running on your server. This software can send emails directly, but ensuring reliable email delivery is non-trivial, and it’s extra important to have reliable delivery for critical alerts. The postfix mail relay software can accept emails from mdadm/smartd and pass them off to an email API service such as mailgun.

So our setup will look like

mdadm/smartd → postfix → mailgun

Postfix email relay

Before getting started, log into the mailgun web UI. Under the SMTP tab of your domain’s settings you’ll see your mailgun SMTP domain (like smtp.mailgun.org) and your SMTP login (like [email protected]).

First install postfix

sudo apt install postfix

There are some options to select during installation:

  • Choose Satellite System as the mailer type.
  • Use your server’s $HOSTNAME as the mail name.
  • Use your mailgun’s SMTP server as the relay host (e.g., smtp.mailgun.org).

Create the file /etc/postfix/sasl_passwd to store your mailgun credentials:

sudo nano /etc/postfix/sasl_passwd

with the following contents:

{mailgun smtp domain} {mailgun smtp login}:{mailgun smtp password}

If you’ve never used SMTP before you may have to reset your SMTP password. This won’t impact your mailgun API key or web login password.

The config file might look like

smtp.mailgun.org [email protected]: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx-xxxxxxxx-xxxxxxxx

Lock down the permissions of the credentials file then load it into postfix.

sudo chmod 600 /etc/postfix/sasl_passwd
sudo postmap /etc/postfix/sasl_passwd

Configure domain mapping

sudo nano /etc/postfix/generic

with your domain and your mailgun SMTP server.

@example.com no-reply@[smtp.mailgun.org]:587

then load that into postfix

sudo postmap /etc/postfix/generic

Finally, configure postfix by adding these lines to /etc/postfix/main.cf (or editing the corresponding lines for any settings that already exist). Replace smtp.mailgun.org with your mailgun SMTP domain.

sudo nano /etc/postfix/main.cf
relayhost = [smtp.mailgun.org]:587
mydestination = localhost.localdomain, localhost
smtp_sasl_auth_enable = yes
smtp_sasl_password_maps = hash:/etc/postfix/sasl_passwd
smtp_sasl_security_options = noanonymous
smtp_sasl_tls_security_options = noanonymous
smtp_sasl_mechanism_filter = AUTH LOGIN
smtp_tls_note_starttls_offer = yes
smtp_generic_maps = hash:/etc/postfix/generic

Reload this new config into postfix with

sudo systemctl restart postfix

You should be all set up to send mail from your server! To test it’s working, you can send a test email to [email protected] with the mail command:

echo "Test message from postfix" | mail -s "Test message" [email protected] 

If you don’t get an email within a few seconds, something’s broken! Check your spam email folder, the Mailgun UI, and logs in /var/loc/mail*

mdadm

Now that we can send email from our server, the next step is to tell our disk monitoring software to use it. We’ll start with mdadm, which monitors the health of your RAID array.

Edit the mdadm config

sudo nano /etc/mdadm/mdadm.conf

and add/modify the MAILADDR setting to the recipient address alerts should be sent to:

then do a quick test with

mdadm --monitor --test --oneshot /dev/md0

(where /dev/md0 is a RAID array). You should get an email in your [email protected] inbox.

That’s all you need to do to have mdadm send email notification of any errors found during a check. However, it’s not uncommon for mdadm to be configured incorrectly and not be performing checks! So it’s worth checking regular checks are set up. Unfortunately this depends on your OS, but on Ubuntu 22.04 you can check for an entry under

sudo systemctl list-timers mdcheck_start

smartd

While mdadm will tell you when a disk has failed, there might be advanced warning of this in the disk’s SMART statistics. smartd is a service that can monitor these statistics and alert you if any fall out of compliance.

You may need to install smartmontools first if the smartd command isn’t found:

sudo apt install smartmontools

Modify the configuration file:

sudo nano /etc/smartd.conf

The file contains lots of commented example configurations, plus potentially an uncommented line beginning with DEVICESCAN. smartd can only handle a single DEVICESCAN directive, so comment any existing lines out then add

DEVICESCAN -o on -H -l error -l selftest -t -M test -m [email protected]

This setting will do the following

  • -o on: Enable monitoring.
  • -H: Check SMART attributes for pre-failure conditions.
  • -l error -l selftest: Check for errors as well as failed test results.
  • -t: Check changes in SMART attributes.
  • -m [email protected]: Send email alerts to this address.
  • -M test: Send a test email when smartd is started.

To test this setup restart the smartd service:

sudo systemctl restart smartd

You should get one email for each disk. You can leave config setting as-is, or remove ` -M test from /etc/smartd.conf` to get email alerts only for errors (not for service restarts).