Prioritize alerts with server monitoring tools

Today’s servers are equipped with a dizzying array of sensors and can produce an incredible variety of alerts. However, an important lesson administrators learn early on is that alerts are not created equal – not all alerts generated by server monitoring tools are actually important. If the servers are configured to notify you every time an alert is triggered, you will receive so many pop-up notifications that really important alerts could go unnoticed. This tip will help administrators determine which alerts are really important and how they want server monitoring tools to notify them of those alerts.

A note on setting up and configuring alerts
Before I begin, I want to point out that there really is no right or wrong way to configure alerts. The recommendations in this tip are based on my two decades of computer experience, but ultimately it comes down to personal preferences. While I hope you find my recommendations useful, each administrator should configure server alerts in a way that meets the unique requirements of their own organization.

The other thing to note is that there are many different ways an administrator can generate alerts. Some servers can generate alerts at the hardware level. These capabilities can be useful, but they are far from the only alert mechanism available. Server vendor server monitoring tools can provide a wealth of information, as can operating system level server monitoring tools, such as Microsoft’s System Center Operations Manager. Because there are many different options for server monitoring and alerting, I will take a generalized approach to the topic rather than focusing on specific server monitoring tools.

Prioritize server alerts
The key to effective server monitoring is to prioritize the alerts generated by server monitoring tools. I recommend classifying each type of alert as high, medium, or low priority.

I like to treat high priority alerts like anything that is absolutely critical. For example, running out of disk space on a server would be a critical event. The failure of a clustered application server would also be a critical event.

Medium priority alerts are a bit more difficult to define. The events that I consider medium priority would probably be defined as high priority by some organizations. I tend to treat an event as medium priority if the condition that caused the alert is not actually causing an outage. For example, if a node in a cluster goes offline for some unknown reason, but the cluster as a whole continues to operate, I would consider this a medium priority. Of course, this has a lot to do with the type of environment I work in. I have worked for large companies that would treat a cluster node failure as a critical event.

If you happen to work for an organization that does not tolerate downtime, it might be a good idea to configure these types of alerts based on whether or not there is a potential single point of failure. For example, suppose you have a RAID array that can handle the failure of two drives without disconnecting. If only one drive in the array fails, you can treat the event as a medium priority alert because the array can still tolerate another drive failure without data loss. However, if two drives failed, you might consider this a high priority, as failure of one additional drive would cause the entire array to fail.

While I tend to think of this as a great way to prioritize alerts, it is much more difficult to configure alerts based on the number of components that have failed than to simply trigger an alert when a failure occurs. . Depending on the type of monitoring you are performing and the features available in your particular monitoring software, setting up this type of alert may not even be an option.

Configuring the alert mechanism
Once you have determined how the different types of alerts should be classified, you will need to decide how you want to be notified of alerts. My personal preference is for the server monitoring tools to send high priority alerts to my cell phone via text message. I have my cell phone with me most of the time, so sending critical alerts to my phone is the best way to make sure I get the alert as quickly as possible.

Since medium priority alerts are important, but not absolutely critical, I prefer to send these alerts to my email. As you can see in Figure A, Windows Server has native email alerting capabilities, which means you can easily send email alerts based on any event that may occur in the system. ‘exploitation.

Figure A

Windows is able to natively send alerts by e-mail.

I tend to check my emails several times a day, which means that an alert sent to my email won’t go unnoticed, but I probably won’t see it as quickly as if the alert was sent to my phone. portable. This is an important distinction, because the last thing I want to be bothered about is a non-critical server alert if I’m going out with friends on the weekends. Of course, this is just one example of how alerts can be sent. Many other options exist. For example, a company named Server Density offers an iPhone server monitoring app with full alert support.

Clearly, the subject of what constitutes a priority alert is certainly open to debate. One more thing to consider, however, is that high priority alerts may not always be related to system failures. For example, most servers can trigger an alert whenever the system enclosure is opened. If no one is supposed to open server enclosures other than you, then an enclosure alarm could very well be a high priority alert. Likewise, an over temperature alert can also be considered a high priority because if the server gets too hot it will eventually cause a shutdown.

About the Author: Brien Posey is a seven-time Microsoft MVP with two decades of IT experience. During this time, he published several thousand articles and wrote or contributed to dozens of computer books. Prior to becoming a freelance writer, Posey was CIO for a national chain of hospitals and healthcare facilities. He has also worked as a network administrator for some of the largest insurance companies in the country and for the Department of Defense at Fort Knox.


Source link