Even though the hard disks are monitored via S.M.A.R.T. and the status of the ZFS pools is reported regularly with the periodic scripts, there is still no immediate notification in the event that one of the ZFS pools has a problem. So if any ZFS-specific problem occurs, we would like to know immediately. There is a small script for this and extension of the Monit configuration.
NEW: For the very impatient I have a console only section. There are only commands, no explanations.
Last update:
I drew inspiration for this script from the following sources
The resulting script is a very lean modification with the following changes:
Scrub
and Trim
have been removed as these are already handled by the periodic scriptsFirstly, the following additional check
is added at the bottom with ee /usr/local/etc/monitrc
:
check program zfs_health with path "/root/zfs_health_check.sh"
if status != 0 then alert
The script stored there is created with ee /root/zfs_health_check.sh
and the following content:
#! /bin/sh
/usr/bin/printf "%s\n\n" "$(/sbin/zfs list -o name,avail,used -d 0)"
/usr/bin/printf "%s\n\n" "$(/sbin/zpool list -o name,size,allocated,free,capacity,health)"
LISTPOOLS="$(/sbin/zpool list -H -o name)"
for POOL in ${LISTPOOLS}; do
HEALTH="$(/sbin/zpool list -H -o health ${POOL})"
ERROR="$(/sbin/zpool status ${POOL} | grep errors: | awk '{print $2}')"
if [ ${HEALTH} != ONLINE ]; then
exit 1
fi
if [ ${ERROR} != "No" ]; then
exit 1
fi
done
Finally, make it executable with chmod +x /root/zfs_health_check.sh
and restart Monit with service monit restart
.
If an error occurs, Monit will then send an e-mail with the details of the error and the affected ZFS pool.
The script can also be executed manually:
fakepool
with zpool offline fakepool /poolfiles/file1
:
fakepool
with zpool online fakepool /poolfiles/file1
:
Voilá