If you do not have a dedicated hardware RAID controller, there are two utilities to be configured and started: smartd and mdadm. The smartd daemon reads hard drive S.M.A.R.T. health data directly off the drives and sends alerts of any changes. Similarly, mdadm watches the health of your Linux software RAIDs for any problems.
If you are using a hardware RAID controller, then it manages some of these tasks. However, you must be sure to properly configure the automated alerts within the controller’s management interface – check the manual for full instructions. Additionally, you may be able to monitor hard drive health data if the controller supports it (3ware and ARECA cards are known to work) – see the smartd man page.
smartd
Here are the standard entries I use in
/etc/smartd.conf:
Code:
/dev/sda -a -d ata -m [email protected] -H -l error -l selftest -M test -o on -S on -s (S/../../3/03|L/../15/./04)
/dev/sdb -a -d ata -m [email protected] -H -l error -l selftest -o on -S on -s (S/../../3/04|L/../15/./05)
These lines are fairly convoluted. In this example, they monitor drives /dev/sda and /dev/sdb by performing the following tasks:
- E-mailing all alerts to
[email protected]
- Sending one test e-mail upon startup
- Watch for any critical failure warnings in the SMART data
- Monitor the results of hard drive self tests
- Enables Automatic Offline Testing of the drives
- Run a short self test on each drive once a week (3am for sda; 4am for sdb)
- Run a long self test on each drive once a month (4am for sda; 5am for sdb)
- Once you have written the configuration file, you need to start the service:
To ensure the service starts at boot, you’ll need to add it to the boot sequence. The exact command depends upon your Linux distribution:
Code:
chkconfig --add smartd (Red Hat, Fedora and SUSE)
rc-update add smartd default (Gentoo)
update-rc.d mdadm defaults (Debian)
mdadm
To monitor Linux software RAIDs, you’ll need at least the following lines in
/etc/mdadm.conf:
DEVICE /dev/sd[ab]1 /dev/sd[ab]5 /dev/sd[ab]6 /dev/sd[ab]7 /dev/sd[ab]8 /dev/sd[ab]10
Code:
ARRAY /dev/md1 devices=/dev/sda1,/dev/sdb1
ARRAY /dev/md5 devices=/dev/sda5,/dev/sdb5
ARRAY /dev/md6 devices=/dev/sda6,/dev/sdb6
ARRAY /dev/md7 devices=/dev/sda7,/dev/sdb7
ARRAY /dev/md8 devices=/dev/sda8,/dev/sdb8
ARRAY /dev/md10 devices=/dev/sda10,/dev/sdb10
MAILADDR [email protected]
Using this example, any changes to the listed md devices will be immediately e-mailed to
[email protected].
Note that some newer versions of mdadm require that devices be identified by UUID (e.g. f4849d33:f8c1ce1c:ac28ac18:9d4741e7) rather than raw device name (e.g. /dev/md1). If this is the case, run mdadm --detail /dev/md1 for each RAID.
Once you have written the configuration file, you need to start the service. Some distributions use the name mdadm and others use mdmonitor:
To ensure the service starts at boot, you’ll need to add it to the boot sequence. The exact command depends upon your Linux distribution:
Code:
chkconfig --add mdadm (Red Hat, Fedora and SUSE)
rc-update add mdadm default (Gentoo)
update-rc.d mdadm defaults (Debian)
As always, be certain that you use your own e-mail address and the names of the actual hard drives and arrays in your system.
This resource came from:
http://www.microway.com/hpc-tech-tips/monitoring-hard-drive-and-raid-health/