Tag Archives: Nagios Plugins

Integrate Apprise into Nagios for More Notification Support

Introduction

Apprise is an open source tool that allows you to send a notification through a wide range of messaging services out there (such as Discord, Slack, Telegram, Microsoft Teams, etc). Well when you combine this with Nagios, you open it up to a much larger scope then simply emailing on an alert.

With Apprise you can configure Nagios to text your mobile phone using Amazon’s Web Service, or notify your devops team on Slack and/or Microsoft Teams. You can even trigger an IFTTT event. But it doesn’t just stop there, Apprise already supports over 35+ notification services today (and is always expanding) which means Nagios could leverage all of this too. This blog will explain how you can set up your instance of Nagios to notify more end points then just email.

The Installation

I’m going to presume you have a copy of Nagios already installed. If you don’t you can check out my blog here on how to set up your own copy with CentOS 7. Those who are not using CentOS are certainly not out of luck though, there are lots of blogs out there to get you started.

This blog will assume you have root privileges or have sudoers privileges.

Apprise can be easily added to your system through pip:

# Install Apprise onto the system currently also hosting Nagios
sudo pip install apprise

Configure Nagios

The Nagios configuration files can vary in their location depending on what Linux distribution you’re using. I’m going to just refer to some standard paths used by the stuff I host here (for CentOS/RedHat).

Step 1: Nagios Import Directory

If you’re using the Nuxref RPMs, then you can skip this step and move to the next as you’ll already be configured for this. Those using another distribution will want to update their nagios.cfg to point to a directory we can use to drop in and remove configuration from. The file is presumably going to be located as: /etc/nagios/nagios.cfg):

# Place this anywhere in /etc/nagios/nagios.cfg
# preferably put it near the bottom of the file.

# Definitions for global configuration directory
cfg_dir=/etc/nagios/conf.d

Now make sure this directory exists because this is where we’ll place our new apprise configuration:

# Ensure our global include directory exists that we
# just defined in our nagios.cfg file:
mkdir -p /etc/nagios/conf.d

# Place a dummy file in here so that Nagios doesn't
# throw any errors (as it isn't a fan of include directories
# without configuration files in it).
touch /etc/nagios/conf.d/dummy.cfg

Step 2: Apprise/Nagios Integration

Now we need to let Nagios know about Apprise. We’ll do this by creating the following files called /etc/nagios/conf.d/apprise.cfg

#
# Apprise to Nagios Configuration File
# Place this file as /etc/nagios/conf.d/apprise.cfg
#
# 'notify-host-by-apprise' command definition
define command{
   command_name   notify-host-by-apprise
   command_line   /usr/bin/printf "%b" "- *Notification Type*: $NOTIFICATIONTYPE$\n- *Host*: $HOSTNAME$\n- *State*: \n- *Address*: $HOSTADDRESS$\n- *Info*: $HOSTOUTPUT$\n\n- *Date/Time*: $LONGDATETIME$\n" | /usr/bin/apprise -c /etc/nagios/apprise.yml -n "$HOSTSTATE$" -g "$NOTIFICATIONTYPE$" -t "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **"
}
 
# 'notify-service-by-apprise' command definition
define command{
   command_name   notify-service-by-apprise
   command_line   /usr/bin/printf "%b" "*Notification Type*: $NOTIFICATIONTYPE$\n- *Service*: $SERVICEDESC$\n- *Host*: $HOSTALIAS$\n- *Address*: $HOSTADDRESS$\n- *State*: $SERVICESTATE$\n- *Date/Time*: $LONGDATETIME$\n\n*Additional Info*:\n$SERVICEOUTPUT$\n" | /usr/bin/apprise -c /etc/nagios/apprise.yml -n "$HOSTSTATE$" -g "$NOTIFICATIONTYPE$" -t "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **"
}

# Register our contact template that we can reference
define contact{
  ; The name of this contact template
  name                             apprise-contact

  ; service notifications can be sent anytime
  service_notification_period      24x7

  ; host notifications can be sent anytime
  host_notification_period         24x7

  ; send notifications for all service states, flapping events,
  ; and scheduled downtime events
  service_notification_options     w,u,c,r,f,s

  ; send notifications for all host states, flapping events,
  ; and scheduled downtime events
  host_notification_options        d,u,r,f,s

  ; send service notifications via email
  service_notification_commands    notify-service-by-apprise

  ; send host notifications via email
  host_notification_commands       notify-host-by-apprise

  ; Don't register this as it is just a template for future
  ; references by contacts who wish to use the apprise plugin
  register                         0
}

Now for every contact we set up going forward, we can point it to use Apprise. By default Nagios usually provides us a contact.cfg file that contains the generic user nagiosadmin. For those using my packaging, you can find this file at /etc/nagios/objects/contacts.cfg; you’ll want to change it to looks like this:

define contact{
  ; Short name of (Nagios) user
  contact_name    nagiosadmin

  ; This next line used to read generic-contact; but we want to switch it
  ; over to our new Apprise based one:
  use             apprise-contact

  ; Full name of user
  alias           Nagios Admin

  ; not important if using apprise-contact (defined above)
  email           nagios@localhost
}

Before you advance to the next step, you’ll want to run a test flight check on your configuration and make sure it validates okay.

# Perform a flight check on our new configuration (as root)
sudo nagios -v /etc/nagios/nagios.cfg

If you get any errors, you should revisit the first part of this blog and try to iron them out before continuing. If everything is error free, then the next step is to reload our instance of Nagios (if it’s running) so it can re-read this configuration. This can be done with the command:

# You will need to be root to do this; send a SIGHUP
# to all instances of nagios running in memory:
sudo killall -HUP nagios

Step 3: Apprise Configuration

Now we need to prepare our Apprise configuration (/etc/nagios/apprise.yml) and fill it with the notification services we want listen for and who we want to pass it to.

We can associate with Nagios notifications types passed to us through tags. Nagios will pass these along one of the following $NOTIFICATIONTYPE$ when an event occurs; these are:

  • PROBLEM: There was an issue with one of the checks.
  • RECOVERY: The issue previously set has been cleared.
  • ACKNOWLEDGEMENT: An outstanding issue has been acknowledged.
  • FLAPPINGSTART: Flapping is a state where a service has a PROBLEM associated with it and then moments later has a RECOVERY. This is the state called FLAPPING. When this process occurs too many times in a row, this alert gets set.
  • FLAPPINGSTOP: The service that was previously FLAPPING is no longer doing so.
  • FLAPPINGDISABLED: Someone just disabled FLAPPING for this service/host.
  • DOWNTIMESTART: The scheduled downtime for this service/host has begun.
  • DOWNTIMEEND: The scheduled downtime is over.
  • DOWNTIMECANCELLED: Someone just cancelled the scheduled downtime for this service/host.

Knowing the above notification types that we’ll receive, here is what an Apprise configuration file located at /etc/nagios/apprise.yml might look like:

# This file should be placed in /etc/nagios/apprise.yml

# NOTE: THIS IS JUST AN EXAMPLE CONFIGURATION FILE. YOU WILL WANT
#       TO CUSTOMIZE YOUR OWN WITH THE SERVICE(S) OF YOUR CHOICE
#       VISIT https://github.com/caronc/apprise TO SEE WHAT IS
#       AVAILABLE AND HOW THEY WORK.

# Identify all of the global notification types we want to flag on.
tag:
  - PROBLEM
  - RECOVERY
  - FLAPPINGSTART
  - FLAPPINGSTOP

# Now we want to define our Apprise URLS; you'll want to visit
# https://github.com/caronc/apprise to see all of the supported
# services and how to build their URLs.
urls:

  # Maybe we want to notify a custom service we're hosting to
  # monitor and track Nagios status; Check out the following 
  # for more details https://github.com/caronc/apprise/wiki/Notify_json

  - json://localhost

  # Maybe we want to notify a Slack channel; more details on this
  # are here: https://github.com/caronc/apprise/wiki/Notify_slack
  - slack://T1JJ3T3L2/A1BRTD4JD/TIiajkdnlazkcOXrIdevi7F/#nuxref

  # the Apprise YAML configuration is quite powerful, the
  # following prepares the email URL and sends an email to each
  # user identified below:
  - email://user:password@gmail.com
      - to: george@example.com
      - to: admin@example.com

  # More details on the emails can be found here:
  # https://github.com/caronc/apprise/wiki/Notify_email

  # We can also individually disperse the tags in the same config
  # file.  The below tags will override the globals defined above.
  # A use case for this would be that maybe we just want to
  # send certain notification types to say... the DevOps team:
  - email://user:password@gmail.com
      - to: devops@example.com
        tag: DOWNTIMESTART, DOWNTIMEEND, DOWNTIMECANCELLED

Once you’re file is all ready, be sure this file is readable by Nagios (to keep it away from prying eyes), but otherwise you’re all set and ready to go!

# Here is what one might do to protect this apprise configuration
chmod 640 /etc/nagios/apprise.yml
chown nagios.root /etc/nagios/apprise.yml

Verification

To be sure everything works, you may want to just test that you got all of your configuration right You can test this using manually as follows:

# Test our configuration with apprise using the PROBLEM tag
# -vvv for some verbose debugging in-case we need it.
apprise -c /etc/nagios/apprise.yml \
    -n CRITICAL -g PROBLEM \
    -vvv \
    -t "A Test Title" \
    -b "a Test Body"

Here is a screenshot of a test error displayed on gitter.im that was sent by Nagios using Apprise:
Gitter Example

Sources

Nagios Core 4.x Setup for CentOS 7.x

Introduction

A few years ago I blogged about setting up Nagios Core 4.x for CentOS 6.x. I later made a blog on setting up NSCA and NRPE too. But times have changed so I figured it was time to do an updated blog for CentOS 7.x and Nagios.

Some new great tools I’ve packaged to make a load and go setup for everyone is:

  • Nagios Core v4.x: updated RPM packaging which I continued to maintain and carry forward from my previous blogs.
  • Nagios Plugins v2.x: A ton of out of the box working plugins.

Best of all, with my RPMs, you can run SELinux in full Enforcing mode for that extra piece of mind from a security standpoint!

Nagios Core

Nagios (for those who don’t know) is an application that allows us to monitor other system/applications we manage. It’s primary function is to immediately bring to our attention any outage or anomaly is detected with our systems. This tool is completely free and should be an essential component of anyone’s business infrastructure.

The current version of Nagios (at the time of writing this blog) is v4.2.2. You can download the latest version from my repository (if you’re set up) as follows:

# Install Nagios Core using NuxRef repositories
# at: https://nuxref.com/repo
yum install -y nagios nagios-selinux

You can also download the packages manually if you wish using this table:

Package Download Description
nagios el7.rpm Nagios Core IV is the the actual monitoring server we can use to monitor our applications.
nagios-selinux el7.rpm An add-on package that allows you to run Nagios in Enforcing Mode.

Note: This RPM is not required by Nagios to run correctly.
nagios-contrib el7.rpm Extra tools that add to the great features Nagios already offers (such as distributed monitoring). These tools are not discussed in this blog entry; but maybe useful to you.
Note: This RPM is not required by Nagios to run correctly.
nagios-devel el7.rpm Header files for developers who want to build using the libnagios shared library.
Note: This RPM is not required by Nagios to run correctly.

Note: The source rpm can be obtained here which builds everything you see in the table above. It’s not required for the application to run, but might be useful for developers or those who want to inspect how I put the package together.

Configure Nagios Core

Once Nagios is installed it can be started like so:

# Start Nagios
systemctl start nagios.service

# If you want Nagios to start after the system is rebooted, you
# can type the following:
systemctl enable nagios.service

# Now we want to turn on Apache if it's not running, otherwise
# reload the configuration if it is:
systemctl status httpd.service && \
   systemctl start httpd.service ||\
   systemctl reload httpd.service

If you followed the instructions above you should be able to access the (Nagios) monitoring website right away by visiting http://localhost/nagios. If you’re installing this on another server you may need to open your web ports to access the Nagios Monitoring site:

# The following commands should be ran on your Nagios Server.
# It will enable our http (and secure https) port on our firewall so our
# monitoring website can be accessed remotely:
firewall-cmd --permanent --add-service=http
firewall-cmd --add-service=http
firewall-cmd --permanent --add-service=https
firewall-cmd --add-service=https
Nagios Credentials
Login nagiosadmin
Password nagiosadmin

You’ll be prompted for a user/pass combination; at this time the default values are defined in the table:

Once you’ve logged in, you can click on links like Services (under the Current Status heading) which will list to you all of the system services you’re currently monitoring and their status:

Nagios > Current Status > Services
Nagios > Current Status > Services

In the example, you can see Nagios has picked up on a high system load (denoted by the yellow warning entry).

Nagios at it’s most basic level is set up at this point and can be maintained by having a look at the following directories:

/etc/nagios/nagios.cfg:
The main configuration file that is read by Nagios when starting up. The only things you may want to change in here are:

Directive Description
date_format The default is us (MM-DD-YYYY HH:MM:SS), but personally I like iso8601 (YYYY-MM-DD HH:MM:SS).
# simple 1-liner to toggle date_format to iso8601:
sed -i -e 's/^\(date_format\)=.*$/\1=iso8601/g' \
   /etc/nagios/nagios.cfg
check_for_updates The default is 1 (which is to check for updates). Personally, I don’t want my web page pinging Nagios every time i access the website for updates. Here is how you can do the same:
# disable update check:
sed -i -e 's/^\(check_for_updates\)=.*$/\1=0/g' \
   /etc/nagios/nagios.cfg

/etc/nagios/objects/contacts.cfg:
This is where a default contact has been created. Feel free to open this up and add your name and (especially the) email address.
/etc/nagios/objects/commands.cfg:
All of the possible checks Nagios can perform on it’s own (without any plugins or extensions) are identified here. It’s as easy as defining a command_name (give it some name) and then tell it what you want to execute with the command_line directive.

A Nagios command is really simple to write; you can write one in any language you want. Here is one written in bash shell:

#!/bin/bash
# Path: /usr/lib64/nagios/plugins/check_temp_file_count
# Please keep in mind this script is pretty useless but
# it's goal is to show you how easy it is to write a
# script for Nagios.
#
# The rules are simple:
# Whatever we echo to the screen get's displayed to Nagios
# and our return value from our script/program will
# determine the color code (and whether or not we alarm)

# A return code of zero (0) tells Nagios everything is okay
RET_OKAY=0

# A return code of one (1) tells Nagios we're reporting a warning
# about whatever it is we're monitoring
RET_WARN=1

# A return code of two (2) tells Nagios we're reporting a critical
# error
RET_CRIT=2

# I define 3 here, but quite honestly, anything you return that
# does not fit in the 0, 1 or 2 response type (as identified
# above) is considered an unknown state.  Try to avoid this
# type if you can.
RET_UNKN=3

# As a test script we'll count all of the files and directories
# in the /tmp folder
COUNT=$(find /tmp 2>/dev/null | wc -l)

# If we have less then 10 files we'll tell Nagios everything
# is okay:
if [ $COUNT -lt 10 ]; then
   echo "$COUNT files; everything is good!"
   exit $RET_OKAY
fi

# If we have less then 30 files we'll tell Nagios that
# it should report a warning
if [ $COUNT -lt 30 ]; then
   echo "$COUNT files; caution!"
   exit $RET_WARN
fi

# Anything more we should report a critical alarm
echo "$COUNT files; Critical!!"
exit $RET_CRIT

If you’re familiar with Perl, the Nagios team has made a framework with it which you use to make your packages too! I’ve already gone ahead and packaged the perl-Nagios-Monitoring-Plugin rpm for you if you want it.

Nagios will associate /usr/lib64/nagios/plugins/ as it’s plugin directory; so you should save any plugins you create there (to keep them all in a common location). Plus if you intend to use SELinux, this is the directory that Nagios is allowed to execute from.

Our entry in the commands.cfg for this new script might look like this:

; 'check_temp_file_count' command definition
; $USER1$ gets translated automatically to our Nagios Plugin directory
; so in our case: /usr/lib64/nagios/plugins/
; Ideally you should call the command the same name as the checking
; tool you wrote.
define command{
	command_name	check_temp_file_count
	command_line	$USER1$/check_temp_file_count
	}

/etc/nagios/objects/localhost.cfg:
This is just a general configuration file for the very machine Nagios is running on. If you open it up, you’ll see that it defines a lot of entries that reference commands (defined already in the commands.cfg file).

A new entry in the commands.cfg for this new script might look like this:

; the 'use' directive identifies a template of information to save
; us from typing it all out here.  For now; just leave this as
; local-service.
;
; The 'hostname' defines our server (defined at the very top of this same
; file. Since the host at the to was defined as 'localhost', we need
; to use this same name here too
;
; The next field is just a description field; it will be how this service
; is presented on Nagios through the website
;
; The check_command is the name we gave it in the commands.cfg
; file.
define service{
        use                             local-service
        host_name                       localhost
        service_description             Count our Temporary Files
        check_command                   check_temp_file_count
}

Check to see if Nagios has any errors with any new configuration you provided:

# This command just tells Nagios to read in it's configuration
# and check if it appears valid:
nagios -v /etc/nagios/nagios.cfg

If everything checks out okay, go ahead and reload Nagios with our
new configuration:

# Reload Nagios
systemctl reload nagios.service

Nagios Plugins

If you’ve installed Nagios, there isn’t really any good reason why you shouldn’t just install the Nagios Plugins too. This is just more tools and checking scripts to make Nagios all that more powerful. The best part is, these tools have been tested over the years, so they’re already proven to be reliable and will allow you to accomplish most monitoring without much effort.

Nagios Plugins
Nagios Plugins

It’s important to note that the Nagios Plugin RPMs are NOT required by Nagios to run correctly. They merely just improve it’s existing functionality. You may however want to install the plugins you’re interested in that monitor systems you’re using.

The current version of the Nagios Plugins (at the time of writing this blog) is v2.1.3. You can download the latest version from my repository (if you’re set up) as follows:

# Install Nagios Core using NuxRef repositories
# See: https://nuxref.com/repo for more information
yum install -y nagios-plugins nagios-plugins-selinux

You can also download the packages manually if you wish using this table:

Package Download Description
nagios-plugins el7.rpm 50+ plugins that are fully adaptable to Nagios in every way. If you’re planning on installing Nagios, don’t forget about adding this package for it’s convenience!
nagios-plugins-selinux el7.rpm An optional add-on package that allows you to use the Nagios Plugins in Enforcing Mode.
nagios-plugins-ldap el7.rpm A Nagios plugin that can be used to check integrity and data entries within an LDAP database.
nagios-plugins-mysql el7.rpm A Nagios plugin that can be used to check integrity and data entries within an MySQL (or Maria) database.
nagios-plugins-ntp el7.rpm A Nagios plugin that can be used to check the NTP status of the machine it’s called on.
nagios-plugins-pgsql el7.rpm A Nagios plugin that can be used to check integrity and data entries within an PostgreSQL database.
nagios-plugins-samba el7.rpm A Nagios plugin that can be used to check status of your Samba mounts and their availability.
nagios-plugins-snmp el7.rpm A Nagios plugin that can query SNMP enabled appliances (routers, firewalls, switches, servers) and convert their output back to something Nagios can monitor or report.

Note: The source rpm can be obtained here which builds everything you see in the table above. It’s not required for the application to run, but might be useful for developers or those who want to inspect how I put the package together.

The main thing to know about this package after it is installed is the slew of new plugins available to you in /usr/lib64/nagios/plugins/ and a config file to get you started which references most of them in /etc/nagios/conf.d/nagios-plugin-commands.cfg.

Extra Plugins

There are a lot of great plugins on Nagios Exchange! I packaged just a few of them because they required patches and tweaks to work out of the box. All of these are available on my repository, but feel free to haul them down directly here:

Package Download Description
nagios-plugins-lvm el7.rpm / src.rpm / NE Source This plugin finds all LVM logical volumes, checks their used space, and compares against the supplied thresholds.
nagios-plugins-crm el7.rpm / src.rpm / NE Source A plugin for monitoring a Pacemaker/Corosync cluster.
Note: that this plugin requires perl-Nagios-Monitoring-Plugin to work.
nagios-plugins-drbd84 el7.rpm / src.rpm / NE Source A plugin for monitoring a DRBD v8.4 setup.

Credit

This blog took me a very (,very) long time to put together and test! The repository hosting alone accommodates all my blog entries up to this date. All of the custom packaging in this blog was done by me personally. If you like what you see and wish to copy and paste this HOWTO, please reference back to this blog post at the very least. It’s really all I ask.

Sources

The remaining portions of this series can be found here:

  • Part 2 – NRDP for Nagios Core on CentOS 7.x: This blog explains how awesome NRDP really is and why it might become a vital asset to your own environment. It’s also provides the first set of working RPMs (with SELinux support of course) of it’s kind.
  • Part 3 – NRPE for Nagios Core on CentOS 7.x: This blog explains how to set up NRPE (v3.x) for your Nagios environment. At the time this blog was written, there was no packaging of it’s kind for this version. So allow me be the first to share it with you!

Configuring and Installing NRPE and NSCA into Nagios Core 4 on CentOS 6

Introduction

About a month ago I wrote (and updated) an article on how to install Nagios Core 4 onto your system. I’m a bit of a perfectionist, so I’ve rebuilt the packages a little to accommodate my needs. Now I thought it might be a good idea to introduce some of the powerful extensions you can get for Nagios.

For an updated solution, you may wish to check out the following:

  • NRDP for Nagios Core on CentOS 7.x: This blog explains how awesome NRDP really is and why it might become a vital asset to your own environment. This tool can be used to replace NSCA’s functionality. The blog also provides the first set of working RPMs (with SELinux support of course) of it’s kind to support it.
  • NRPE for Nagios Core on CentOS 7.x: This blog explains how to set up NRPE (v3.x) for your Nagios environment. At the time this blog was written, there was no packaging of it’s kind for this version.

RPM Solution

RPMs provide a version control and an automated set of scripts to configure the system how I want it. The beauty of them is that if you disagree with something the tool you’re packaging does, you can feed RPMs patch files to accommodate it without obstructing the original authors intention.

Now I won’t lie and claim I wrote these SPEC files from scratch because I certainly didn’t. I took the stock ones that ship with these products (NRPE and NSCA) and modified them to accommodate and satisfy my compulsive needs. πŸ™‚

My needs required a bit more automation in the setup as well as including:

  • A previous Nagios requirement I had was a /etc/nagios/conf.d directory to behave similar to how Apache works. I wanted to be able to drop configuration files into here and just have it work without re-adjusting configuration files. In retrospect of this, these plugins are a perfect example of what can use this folder and work right out of the box.
  • These new Nagios plugins should adapt to the new nagiocmd permissions. The nagioscmd group permission was a Nagios requirement I had made in my previous blog specifically for the plugin access.
  • NSCA should prepare some default configuration to make it easier on an administrator.
  • NSCA servers that don’t respond within a certain time should advance to a critical state. This should be part of the default (optional) configuration one can use.
  • Both NRPE and NSCA should plug themselves into Nagios silently without human intervention being required.
  • Both NRPE and NSCA should log independently to their own controlled log file that is automatically rotated by the system when required.

Nagios Enhancement Focus

The key things I want to share with you guys that you may or may not find useful for your own environment are the following:

  • Nagios Remote Plugin Executor (NRPE): NRPE (officially accessed here) provides a way to execute all of the Nagios monitoring tools on a remote server. These actions are all preformed through a secure (private) connection to the remote server and then reported back to Nagios. NRPE can allow you to monitor servers that are spread over a WAN (even the internet) from one central monitoring server. This is truly the most fantastic extension of Nagios in my opinion.
    NRPE High Level Overview
    NRPE High Level Overview
  • Nagios Service Check Acceptor (NSCA): NSCA (officially accessed here) provides a way for external applications to report their status directly to the Nagios Server on their own. This solution still allows the remote monitoring of a system by taking the responsibility off of the status checks off of Nagios. However the fantastic features of Nagios are still applicable: You are still centrally monitoring your application and Nagios will immediately take action in notifying you if your application stops responding or reports a bad status. This solution is really useful when working with closed systems (where opening ports to other systems is not an option).
    NSCA High Level Overview
    NSCA High Level Overview

Just give me your packaged RPMS

Here they are:

How do I make these packages work for me?

In all cases, the RPMs take care of just about everything for you, so there isn’t really much to do at this point. Some considerations however are as follows:

  • NRPE
    NRPE - Nagios Remote Plugin Executor
    NRPE – Nagios Remote Plugin Executor

    In an NRPE setup, Nagios is always the client and all of the magic happens when it uses the check_nrpe plugin. Most of NRPE’s configuration resides at the remote server that Nagios will monitor. In a nutshell, NRPE will provide the gateway to check a remote system’s status but in a much more secure and restrictive manor than the check_ssh which already comes with the nagios-plugins package. The check_ssh requires you to create a remote user account it can connect with for remote checks. This can leave your system vulnerable to an attack since you can do a lot more damage with a compromised SSH account. However check_nrpe uses the NRPE protocol and can only return what you let it; therefore making it a MUCH safer choice then check_ssh!

    You’ll want to install nagios-plugins-nrpe on the same server your hosting Nagios on:

    # Download NRPE
    wget --output-document=nagios-plugins-nrpe-2.15-1.el6.x86_64.rpm http://repo.nuxref.com/centos/6/en/x86_64/custom/nagios-plugins-nrpe-2.15-4.el6.nuxref.x86_64.rpm
    
    # Now install it
    yum -y localinstall nagios-plugins-nrpe-2.15-1.el6.x86_64.rpm
    

    Again I must stress, the above setup will work right away presuming you chose to use my custom build of Nagios introduced in my blog that went with it.

    Just to show you how everything works, we’ll make the Nagios Server the NRPE Server as well. In real world scenario, this would not be the case at all! But feel free to treat the setup example below on a remote system as well because it’s configuration will be identical! πŸ™‚

    # Install our NRPE Server
    wget --output-document=nrpe-2.15-1.el6.x86_64.rpm http://repo.nuxref.com/centos/6/en/x86_64/custom/nrpe-2.15-4.el6.nuxref.x86_64.rpm
    
    # Install some Nagios Plugins we can configure NRPE to use
    wget --output-document=nagios-plugins-1.5-1.x86_64.rpm http://repo.nuxref.com/centos/6/en/x86_64/custom/nagios-plugins-1.5-5.el6.nuxref.x86_64.rpm
    
    # Now Install it
    yum -y localinstall nrpe-2.15-1.el6.x86_64.rpm 
       nagios-plugins-1.5-1.x86_64.rpm
    # This tool requires xinetd to be running; start it if it isn't
    # already running
    service xinetd status || service xinetd start
    
    # Make sure our system will always start xinetd
    # even if it's rebooted
    chkconfig --level 345 xinetd on
    

    Now we can test our server by creating a test configuration:

    # Create a NRPE Configuration our server can accept
    cat << _EOF > /etc/nrpe.d/check_mail.cfg
    command[check_mailq]=/usr/lib64/nagios/plugins/check_mailq -c 100 -w 50
    _EOF
    
    # Create a temporary test configuration to work with:
    cat << _EOF > /etc/nagios/conf.d/nrpe_test.cfg
    define service{
       use                 local-service
       service_description Check Users
       host_name           localhost
       # check_users is already defined for us in /etc/nagios/nrpe.cfg
    	check_command		  check_nrpe!check_users
    }
    
    # Test our new custom one we just created above
    define service{
       use                 local-service
       service_description Check Mail Queue
       host_name           localhost
       # Use the new check_mailq we defined above in /etc/nrpe.d/check_mail.cfg
    	check_command		  check_nrpe!check_mailq
    }
    _EOF
    
    # Reload Nagios so it sees our new configuration defined in
    # /etc/nagios/conf.d/*
    service nagios reload
    
    # Reload xinetd so nrpe sees our new configuration defined in
    # /etc/nrpe.d/*
    service xinetd reload
    

    We can even test our connection manually by calling the command:

    # This is what the output will look like if everything is okay:
    /usr/lib64/nagios/plugins/check_nrpe -H localhost -c check_mailq
    OK: mailq is empty|unsent=0;50;100;0
    

    Another scenario you might see (when setting on up on your remote server) is:

    /usr/lib64/nagios/plugins/check_nrpe -H localhost -c check_mailq
    CHECK_NRPE: Error - Could not complete SSL handshake.
    

    Uh oh, Could not complete SSL handshake.! What does that mean?
    This is the most common error people see with the NRPE plugin. If you Google it, you’ll get an over-whelming amount of hits suggesting how you can resolve the problem. I found this link useful.
    That all said, I can probably tell you right off the bat why it isn’t working for you. Assuming you’re using the packaging I provided then it’s most likely because your NRPE Server is denying the requests your Nagios Server is making to it.

    To fix this, access your NRPE Server and open up /etc/xinetd/nrpe in an editor of your choice. You need to allow your Nagios Server access by adding it’s IP address to the only_from entry. Or you can just type the following:

    # Set your Nagios Server IP here:
    NAGIOS_SERVER=192.168.192.168
    
    # If you want to keep your previous entries and append the server
    # you can do the following (spaces delimit the servers):
    sed -i -e "s|^(.*only_from[^=]+=)[ t]*(.*)|1 2 $NAGIOS_SERVER|g" 
       /etc/xinetd.d/nrpe
    
    # The below command is fine too to just replace what is there
    # with the server of your choice (you can use either example
    sed -i -e "s|^(.*only_from[^=]+=).*|1 $NAGIOS_SERVER|g" 
       /etc/xinetd.d/nrpe
    
    # When your done, restart xinetd to update it's configuration
    service xinetd reload
    

    Those who didn’t receive the error I showed above, it’s only because your using your Nagios Server as your NRPE Server too (which the xinetd tool is pre-configured to accept by default). So please pay attention to this when you start installing the NRPE server remotely.

    You will want to install nagios-plugins-nrpe on to your NRPE Server as well granting you access to all the same great monitoring tools that have already been proven to work and integrate perfectly with Nagios. This will save you a great deal of effort when setting up the NRPE status checks.

    As a final note, you may want to make sure port 5666 is open on your NRPE Server’s firewall otherwise the Nagios Server will not be able to preform remote checks.

    ## Open NRPE Port (as root)
    iptables -A INPUT -m state --state NEW -m tcp -p tcp --dport 5666 -j ACCEPT
    
    # consider adding this change to your iptables configuration
    # as well so when you reboot your system the port is
    # automatically open for you. See: /etc/sysconfig/iptables
    # You'll need to add a similar line as above (without the
    # iptables reference)
    # -A INPUT -m state --state NEW -m tcp -p tcp --dport 5666 -j ACCEPT
    
  • NSCA
    NSCA - Nagios Service Check Acceptor
    NSCA – Nagios Service Check Acceptor

    Remember, NSCA is used for systems that connect to you remotely (instead of you connecting to them (what NRPE does). This is a perfect choice plugin for systems you do not want to open ports up to unnecessarily on your remote system. That said, it means you need to open up ports on your Monitoring (Nagios) server instead.

    You’ll want to install nsca on the same server your hosting Nagios on:

    # Download NSCA
    wget --output-document=nsca-2.7.2-9.el6.x86_64.rpm http://repo.nuxref.com/centos/6/en/x86_64/custom/nsca-2.7.2-10.el6.nuxref.x86_64.rpm
    
    # Now install it
    yum -y localinstall nsca-2.7.2-9.el6.x86_64.rpm
    
    # This tool requires xinetd to be running; start it if it isn't
    # already running
    service xinetd status || service xinetd start
    
    # Make sure our system will always start xinetd
    # even if it's rebooted
    chkconfig --level 345 xinetd on
    
    # SELinux Users may wish to turn this flag on if they intend to allow it
    # to call content as root (using sudo) which it must do for some status checks.
    setsebool -P nagios_run_sudo on
    

    The best way to test if everything is working okay is by also installing the nsca-client on the same machine we just installed NSCA on (above). Then we can simply create a test passive service to test everything with. The below setup will work presuming you chose to use my custom build of Nagios introduced in my blog that went with it.

    # First install our NSCA client on the same machine we just installed NSCA
    # on above.
    wget http://repo.nuxref.com/centos/6/en/x86_64/custom/nsca-client-2.7.2-10.el6.nuxref.x86_64.rpm
    
    # Now install it
    yum -y localinstall nsca-client-2.7.2-9.el6.x86_64.rpm
    
    # Create a temporary test configuration to work with:
    cat << _EOF > /etc/nagios/conf.d/nsca_test.cfg
    # Define a test service. Note that the service 'passive_service'
    # is already predefined in /etc/nagios/conf.d/nsca.cfg which was
    # placed when you installed my nsca rpm
    define service{
       use                 passive_service
       service_description TestMessage
       host_name           localhost
    }
    _EOF
    
    # Now reload Nagios to it reads in our new configuration
    # Note: This will only work if you are using my Nagios build
    service nagios reload
    

    Now that we have a test service set up, we can send it different nagios status through the send_nsca binary that was made available to us after installing nsca-client.

    # Send a Critical notice to Nagios using our test service
    # and send_nsca. By default send_nsca uses the '<tab>' as a
    # delimiter, but that is hard to show in a blog (it can get mixed up
    # with the space.  So in the examples below i add a -d switch
    # to adjust what the delimiter in the message.
    # The syntax is simple:
    #    hostname,nagios_service,status_code,status_msg
    #
    # The test service we defined above identifies both the
    # 'host_name' and 'service_description' define our first 2
    # delimited columns below. The status_code is as simple as:
    #       0 : Okay
    #       1 : Warning
    #       2 : Critical
    # The final delimited entry is just the human readable text
    # we want to pass a long with the status.
    #
    # Here we'll send our critical message:
    cat << _EOF | /usr/sbin/send_nsca -H 127.0.0.1 -d ','
    localhost,TestMessage,2,This is a Test Error
    _EOF
    
    # Open your Nagios screen (http://localhost/nagios) at this point and watch the
    # status change (it can take up to 4 or 5 seconds or so to register
    # the command above).
    
    # Cool?  Here is a warning message:
    cat << _EOF | /usr/sbin/send_nsca -H 127.0.0.1 -d ',' -c /etc/nagios/send_nsca.cfg
    localhost,TestMessage,1,This is a Test Warning
    _EOF
    
    # Check your warning on Nagios, when your happy, here is your
    # OKAY message:
    cat << _EOF | /usr/sbin/send_nsca -H 127.0.0.1 -d ',' -c /etc/nagios/send_nsca.cfg
    localhost,TestMessage,0,Life is good!
    _EOF
    

    Since NSCA requires you to listen to a public port, you’ll need to know this last bit of information to complete your NSCA configuration. Up until now the package i provide only open full access to localhost for security reasons. But you’ll need to take the next step and allow your remote systems to talk to you.

    NSCA uses port 5667, so you’ll want to make sure your firewall has this port open using the following command:

    ## Open NSCA Port (as root)
    iptables -A INPUT -m state --state NEW -m tcp -p tcp --dport 5667 -j ACCEPT
    
    # consider adding this change to your iptables configuration
    # as well so when you reboot your system the port is
    # automatically open for you. See: /etc/sysconfig/iptables
    # You'll need to add a similar line as above (without the
    # iptables reference)
    # -A INPUT -m state --state NEW -m tcp -p tcp --dport 5667 -j ACCEPT
    

    Another security in place with the NSCA configuration you installed out of
    the box is that it is being managed by xinetd. The configuration can
    be found here: /etc/xinetd.d/nsca. The security restriction in place that you’ll want to pay close attention to is line 16 which reads:

    only_from = 127.0.0.1 ::1

    If you remove this line, you’ll allow any system to connect to yours; this is a bit unsafe but an option. Personally, I recommend that you individually add each remote system you want to monitor to this line. Use a space to separate more the one system.

    You can consider adding more security by setting up a NSCA paraphrase which will reside in /etc/nagios/nsca.cfg to which you can place the same paraphrase in all of the nsca-clients you set up by updating /etc/nagios/send_nsca.cfg.

    Consider our example above; I can do the following to add a paraphrase:

    # Configure Client
    sed -i -e 's/^#*password=/password=ABCDEFGHIJKLMNOPQRSTUVWXYZ/g' 
       /etc/nagios/send_nsca.cfg
    # Configure Server
    sed -i -e 's/^#*password=/password=ABCDEFGHIJKLMNOPQRSTUVWXYZ/g' 
       /etc/nagios/nsca.cfg
    # Reload xinetd so it rereads /etc/nagios/nsca.cfg
    service xinetd reload
    

I don’t trust you, I want to repackage this myself!

As always, I will always provide you a way to build the source code from scratch if you don’t want to use what I’ve already prepared. I use mock for everything I build so I don’t need to haul in development packages into my native environment. You’ll need to make sure mock is setup and configured properly first for yourself:

# Install 'mock' into your environment if you don't have it already.
# This step will require you to be the superuser (root) in your native
# environment.
yum install -y mock

# Grant your normal every day user account access to the mock group
# This step will also require you to be the root user.
usermod -a -G mock YourNonRootUsername

At this point it’s safe to change from the ‘root‘ user back to the user account you granted the mock group privileges to in the step above. We won’t need the root user again until the end of this tutorial when we install our built RPM.

Just to give you a quick summary of what I did, here are the new spec files and patch files I created:

  • NSCA RPM SPEC File: Here is the enhanced spec file I used (enhancing the one already provided in the EPEL release found on pkgs.org). At the time I wrote this blog, the newest version of NSCA was v2.7.2-8. This is why I repackaged it as v2.7.2-9 to include my enhancements. I created 2 patches along with the spec file enhancements.
    nrpe.conf.d.patch was created to provide a working NRPE configuration right out of the box (as soon as it was installed) and nrpe.xinetd.logrotate.patch was created to pre-configure a working xinetd server configuration.
  • NRPE RPM SPEC File: Here is the enhanced spec file I used (enhancing the one already provided in the EPEL release found on pkgs.org). At the time I wrote this blog, the newest version of NRPE was v2.14-5. However v2.15 was available off of the Nagios website so this is why I repackaged it as v2.15-1 to include my enhancements.
    nsca.xinetd.logrotate.patch was the only patch I needed to create to prepare a NSCA xinetd server working out of the box.

Everything else packaged (patches and all) are the same ones carried forward from previous versions by their package managers.

Rebuild your external monitoring solutions:

Below shows the long way of rebuilding the RPMs from source.

# Perhaps make a directory and work within it so it's easy to find
# everything later
mkdir nagiosbuild
cd nagiosbuild
###
# Now we want to download all the requirements we need to build
###
# Prepare our mock environment
###
# Initialize Mock Environment
mock -v -r epel-6-x86_64 --init

# NRPE (v2.15)
wget http://repo.nuxref.com/centos/6/en/source/custom/nrpe-2.15-4.el6.nuxref.src.rpm 
mock -v -r epel-6-x86_64 --copyin nrpe-2.15-1.el6.src.rpm /builddir/build

# NSCA (v2.7.2)
wget http://repo.nuxref.com/centos/6/en/source/custom/nsca-2.7.2-10.el6.nuxref.src.rpm 
mock -v -r epel-6-x86_64 --copyin nsca-2.7.2-9.el6.src.rpm /builddir/build

#######################
### THE SHORT WAY #####
#######################
# Now, the short way to rebuild everything is through these commands:
mock -v -r epel-6-x86_64 --resultdir=$(pwd)/results 
   --rebuild  nrpe-2.15-1.el6.src.rpm  nsca-2.7.2-9.el6.src.rpm

# You're done; You can find all of your rpms in a results directory
# in the same location you typed the above command in.  You can 
# alternatively rebuild everything the long way allowing you to
# inspect the content in more detail and even change it for your
# own liking

#######################
### THE LONG WAY  #####
#######################
# Install NRPE Dependencies
mock -v -r epel-6-x86_64 --install 
   autoconf automake libtool openssl-devel tcp_wrappers-devel

# Install NSCA Dependencies
mock -v -r epel-6-x86_64 --install 
   tcp_wrappers-devel libmcrypt-devel

###
# Build Stage
###
# Shell into our enviroment
mock -v -r epel-6-x86_64 --shell

# Change to our build directory
cd builddir/build

# Install our SRPMS (within our mock jail)
rpm -Uhi nsca-*.src.rpm nrpe-*.src.rpm

# Now we'll have placed all our content in the SPECS and SOURCES
# directory (within /builddir/build).  Have a look to verify
# content if you like

# Build our RPMS
rpmbuild -ba SPECS/*.spec

# we're now done with our mock environment for now; Press Ctrl-D to
# exit or simply type exit on the command line of our virtual
# environment
exit

###
# Save our content that we built in the mock environment
###

#NRPE
mock -v -r epel-6-x86_64 --copyout /builddir/build/SRPMS/nrpe-2.15-1.el6.src.rpm .
mock -v -r epel-6-x86_64 --copyout /builddir/build/RPMS/nrpe-2.15-1.el6.x86_64.rpm .
mock -v -r epel-6-x86_64 --copyout /builddir/build/RPMS/nagios-plugins-nrpe-2.15-1.el6.x86_64.rpm .
mock -v -r epel-6-x86_64 --copyout /builddir/build/RPMS/nrpe-debuginfo-2.15-1.el6.x86_64.rpm .

#NSCA
mock -v -r epel-6-x86_64 --copyout /builddir/build/SRPMS/nsca-2.7.2-9.el6.src.rpm .
mock -v -r epel-6-x86_64 --copyout /builddir/build/RPMS/nsca-2.7.2-9.el6.x86_64.rpm .
mock -v -r epel-6-x86_64 --copyout /builddir/build/RPMS/nsca-client-2.7.2-9.el6.x86_64.rpm .
mock -v -r epel-6-x86_64 --copyout /builddir/build/RPMS/nsca-debuginfo-2.7.2-9.el6.x86_64.rpm .

# *Note that all the commands that interact with mock I pass in 
# the -v which outputs a lot of verbose information. You don't
# have to supply it; but I like to see what is going on at times.

# **Note: You may receive this warning when calling the '--copyout'
# above:
# WARNING: unable to delete selinux filesystems 
#    (/tmp/mock-selinux-plugin.??????): #
#    [Errno 1] Operation not permitted: '/tmp/mock-selinux-plugin.??????'
#
# This is totally okay; and is safe to ignore, the action you called
# still worked perfectly; so don't panic!

So where do I go from here?
NRPE and NSCA are both fantastic solutions that can allow you to tackle any monitoring problem you ever had. In this blog here I focus specifically on Linux, but these tools are also available on Microsoft Windows as well. You can easily have 1 Nagios Server manage thousands of remote systems (of all operating system flavours). There are hundreds of fantastic tools to monitor all mainstream applications used today (Databases, Web Servers, etc). Even if your trying to support a custom application you wrote. If you can interface with your application using the command line interface, well then Nagios can monitor it for you. You only need to write a small script with this in mind:

  • Your script should always have an exit code of 0 (zero) if everything is okay, 1 (one) if you want to raise a warning, and 2 (two) if you want to raise a critical alarm.
  • No matter what the exit code is, you should also echo some kind of message that someone could easily interpret what is going on.

There is enough information in this blog to do the rest for you (as far as creating a Nagios configuration entry for it goes). If you followed the 2 rules above, then everything should ‘just work’. It’s truely that easy and powerful.

How do I decide if I need NSCA or NRPE?

NRPE & NSCA High Level Overview
NRPE & NSCA High Level Overview

NRPE makes it Nagios’s responsibility to check your application where as NSCA makes it your applications responsible to report its status. Both have their pros and cons. NSCA could be considered the most secure approach because at the end of the day the only port that requires opening is the one on the Nagios server. NSCA does not use a completely secure connection (but there is encryption none the less). NRPE is very secure and doesn’t require you to really do much since it just simply works with the nagios-plugins already available. It litterally just extends these existing Nagios local checks to remote ones. NSCA requires you to configure a cron, or adjust your applications in such a way that it frequently calls the send_nsca command. NSCA can be a bit more difficult to set up but creates some what of a heartbeat between you and the system monitoring it (which can be a good thing too). I pre-configured the NSCA server with a small tweak that will automatically set your application to a critical state if a send_nsca call is missed for an extended period of time.

Always consider that the point of this blog was to state that you can use both at the same time giving you total flexibility over all of your systems that require monitoring.

Credit

All of the custom packaging in this blog was done by me personally. I took the open source available to me and rebuilt it to make it an easier solution and decided to share it. If you like what you see and wish to copy and paste this HOWTO, please reference back to this blog post at the very least. It’s really all I ask.

Sources

I referenced the following resources to make this blog possible:

  • The blog I wrote earlier that is recommended you read before this one:Configuring and Installing Nagios Core 4 on CentOS 6
  • Official NRPE download link; I used all of the official documentation to make the NRPE references on this blog possible.
  • A document identifying the common errors you might see and their resolution here.
  • Official NSCA download link; I used all of the official documentation to make the NSCA references on this blog possible.
  • The NRPE and NSCA images I’m reposting on this blog were taking straight from their official sites mentioned above.
  • Linux Packages Search (pkgs.org) was where I obtained the source RPMs as well as their old SPEC files. These would be a starting point before I’d expand them.
  • A bit outdated, but a great (and simple) representation of how NSCA works with Nagios can be seen here.