Do a monitoring system bake off/shoot out to find a successor or successors for https://xymon.ccs.neu.edu, one(s) with a larger install base (read: larger community) and more features/flexibility (ie: plugins, monitoring capability).
Whichever monitoring system(s) we select must (at minimum) be able to:
- network services (eg: ping, ssh, RDP, http, smtp, imap, etc)
- host state (eg: CPU/RAM/Disk usage, processes, uptime, etc) on Linux & Windows
- network devices (eg: SNMP or similar)
- NetApp(s) (SNMP or ssh)
- arbitrary conditions on hosts (ie: Custom monitoring of processes/filesystem/etc)
- arbitrary services (eg: mail round trip time, interactive ssh responsiveness, etc)
- SSL certs
- Send alerts via at least some of:
- Graphing & time series data for monitored hosts and services (eg: uptime, response time, resource usage, etc graphs)
- Configurable from external sources (eg: configs can be autogenerated from hostbase, Puppet, etc)
Note: There are those who argue (persuasively, in Chris A's view) that monitoring for the purpose of alerting about problems (eg: "wake someone up so they fix it") and monitoring for the purposes of graphing trends over time (eg: track resource consumption, plan new purchases/deployments/etc) are two fundamentally different jobs, and are best served by different systems. It would obviously be easier to only deploy one system that does both tasks, but if the choice is between one system that does two tasks poorly or two systems that each do one task well, we should keep this in mind.
Note: These are options to consider. You should not limit yourself to just these systems, nor should you assume that all of these systems are good options.