Tag Archives: Nagios Core

Integrate Apprise into Nagios for More Notification Support

Introduction

Apprise is an open source tool that allows you to send a notification through a wide range of messaging services out there (such as Discord, Slack, Telegram, Microsoft Teams, etc). Well when you combine this with Nagios, you open it up to a much larger scope then simply emailing on an alert.

With Apprise you can configure Nagios to text your mobile phone using Amazon’s Web Service, or notify your devops team on Slack and/or Microsoft Teams. You can even trigger an IFTTT event. But it doesn’t just stop there, Apprise already supports over 35+ notification services today (and is always expanding) which means Nagios could leverage all of this too. This blog will explain how you can set up your instance of Nagios to notify more end points then just email.

The Installation

I’m going to presume you have a copy of Nagios already installed. If you don’t you can check out my blog here on how to set up your own copy with CentOS 7. Those who are not using CentOS are certainly not out of luck though, there are lots of blogs out there to get you started.

This blog will assume you have root privileges or have sudoers privileges.

Apprise can be easily added to your system through pip:

# Install Apprise onto the system currently also hosting Nagios
sudo pip install apprise

Configure Nagios

The Nagios configuration files can vary in their location depending on what Linux distribution you’re using. I’m going to just refer to some standard paths used by the stuff I host here (for CentOS/RedHat).

Step 1: Nagios Import Directory

If you’re using the Nuxref RPMs, then you can skip this step and move to the next as you’ll already be configured for this. Those using another distribution will want to update their nagios.cfg to point to a directory we can use to drop in and remove configuration from. The file is presumably going to be located as: /etc/nagios/nagios.cfg):

# Place this anywhere in /etc/nagios/nagios.cfg
# preferably put it near the bottom of the file.

# Definitions for global configuration directory
cfg_dir=/etc/nagios/conf.d

Now make sure this directory exists because this is where we’ll place our new apprise configuration:

# Ensure our global include directory exists that we
# just defined in our nagios.cfg file:
mkdir -p /etc/nagios/conf.d

# Place a dummy file in here so that Nagios doesn't
# throw any errors (as it isn't a fan of include directories
# without configuration files in it).
touch /etc/nagios/conf.d/dummy.cfg

Step 2: Apprise/Nagios Integration

Now we need to let Nagios know about Apprise. We’ll do this by creating the following files called /etc/nagios/conf.d/apprise.cfg

#
# Apprise to Nagios Configuration File
# Place this file as /etc/nagios/conf.d/apprise.cfg
#
# 'notify-host-by-apprise' command definition
define command{
   command_name   notify-host-by-apprise
   command_line   /usr/bin/printf "%b" "- *Notification Type*: $NOTIFICATIONTYPE$\n- *Host*: $HOSTNAME$\n- *State*: \n- *Address*: $HOSTADDRESS$\n- *Info*: $HOSTOUTPUT$\n\n- *Date/Time*: $LONGDATETIME$\n" | /usr/bin/apprise -c /etc/nagios/apprise.yml -n "$HOSTSTATE$" -g "$NOTIFICATIONTYPE$" -t "** $NOTIFICATIONTYPE$ Host Alert: $HOSTNAME$ is $HOSTSTATE$ **"
}
 
# 'notify-service-by-apprise' command definition
define command{
   command_name   notify-service-by-apprise
   command_line   /usr/bin/printf "%b" "*Notification Type*: $NOTIFICATIONTYPE$\n- *Service*: $SERVICEDESC$\n- *Host*: $HOSTALIAS$\n- *Address*: $HOSTADDRESS$\n- *State*: $SERVICESTATE$\n- *Date/Time*: $LONGDATETIME$\n\n*Additional Info*:\n$SERVICEOUTPUT$\n" | /usr/bin/apprise -c /etc/nagios/apprise.yml -n "$HOSTSTATE$" -g "$NOTIFICATIONTYPE$" -t "** $NOTIFICATIONTYPE$ Service Alert: $HOSTALIAS$/$SERVICEDESC$ is $SERVICESTATE$ **"
}

# Register our contact template that we can reference
define contact{
  ; The name of this contact template
  name                             apprise-contact

  ; service notifications can be sent anytime
  service_notification_period      24x7

  ; host notifications can be sent anytime
  host_notification_period         24x7

  ; send notifications for all service states, flapping events,
  ; and scheduled downtime events
  service_notification_options     w,u,c,r,f,s

  ; send notifications for all host states, flapping events,
  ; and scheduled downtime events
  host_notification_options        d,u,r,f,s

  ; send service notifications via email
  service_notification_commands    notify-service-by-apprise

  ; send host notifications via email
  host_notification_commands       notify-host-by-apprise

  ; Don't register this as it is just a template for future
  ; references by contacts who wish to use the apprise plugin
  register                         0
}

Now for every contact we set up going forward, we can point it to use Apprise. By default Nagios usually provides us a contact.cfg file that contains the generic user nagiosadmin. For those using my packaging, you can find this file at /etc/nagios/objects/contacts.cfg; you’ll want to change it to looks like this:

define contact{
  ; Short name of (Nagios) user
  contact_name    nagiosadmin

  ; This next line used to read generic-contact; but we want to switch it
  ; over to our new Apprise based one:
  use             apprise-contact

  ; Full name of user
  alias           Nagios Admin

  ; not important if using apprise-contact (defined above)
  email           nagios@localhost
}

Before you advance to the next step, you’ll want to run a test flight check on your configuration and make sure it validates okay.

# Perform a flight check on our new configuration (as root)
sudo nagios -v /etc/nagios/nagios.cfg

If you get any errors, you should revisit the first part of this blog and try to iron them out before continuing. If everything is error free, then the next step is to reload our instance of Nagios (if it’s running) so it can re-read this configuration. This can be done with the command:

# You will need to be root to do this; send a SIGHUP
# to all instances of nagios running in memory:
sudo killall -HUP nagios

Step 3: Apprise Configuration

Now we need to prepare our Apprise configuration (/etc/nagios/apprise.yml) and fill it with the notification services we want listen for and who we want to pass it to.

We can associate with Nagios notifications types passed to us through tags. Nagios will pass these along one of the following $NOTIFICATIONTYPE$ when an event occurs; these are:

  • PROBLEM: There was an issue with one of the checks.
  • RECOVERY: The issue previously set has been cleared.
  • ACKNOWLEDGEMENT: An outstanding issue has been acknowledged.
  • FLAPPINGSTART: Flapping is a state where a service has a PROBLEM associated with it and then moments later has a RECOVERY. This is the state called FLAPPING. When this process occurs too many times in a row, this alert gets set.
  • FLAPPINGSTOP: The service that was previously FLAPPING is no longer doing so.
  • FLAPPINGDISABLED: Someone just disabled FLAPPING for this service/host.
  • DOWNTIMESTART: The scheduled downtime for this service/host has begun.
  • DOWNTIMEEND: The scheduled downtime is over.
  • DOWNTIMECANCELLED: Someone just cancelled the scheduled downtime for this service/host.

Knowing the above notification types that we’ll receive, here is what an Apprise configuration file located at /etc/nagios/apprise.yml might look like:

# This file should be placed in /etc/nagios/apprise.yml

# NOTE: THIS IS JUST AN EXAMPLE CONFIGURATION FILE. YOU WILL WANT
#       TO CUSTOMIZE YOUR OWN WITH THE SERVICE(S) OF YOUR CHOICE
#       VISIT https://github.com/caronc/apprise TO SEE WHAT IS
#       AVAILABLE AND HOW THEY WORK.

# Identify all of the global notification types we want to flag on.
tag:
  - PROBLEM
  - RECOVERY
  - FLAPPINGSTART
  - FLAPPINGSTOP

# Now we want to define our Apprise URLS; you'll want to visit
# https://github.com/caronc/apprise to see all of the supported
# services and how to build their URLs.
urls:

  # Maybe we want to notify a custom service we're hosting to
  # monitor and track Nagios status; Check out the following 
  # for more details https://github.com/caronc/apprise/wiki/Notify_json

  - json://localhost

  # Maybe we want to notify a Slack channel; more details on this
  # are here: https://github.com/caronc/apprise/wiki/Notify_slack
  - slack://T1JJ3T3L2/A1BRTD4JD/TIiajkdnlazkcOXrIdevi7F/#nuxref

  # the Apprise YAML configuration is quite powerful, the
  # following prepares the email URL and sends an email to each
  # user identified below:
  - email://user:password@gmail.com
      - to: george@example.com
      - to: admin@example.com

  # More details on the emails can be found here:
  # https://github.com/caronc/apprise/wiki/Notify_email

  # We can also individually disperse the tags in the same config
  # file.  The below tags will override the globals defined above.
  # A use case for this would be that maybe we just want to
  # send certain notification types to say... the DevOps team:
  - email://user:password@gmail.com
      - to: devops@example.com
        tag: DOWNTIMESTART, DOWNTIMEEND, DOWNTIMECANCELLED

Once you’re file is all ready, be sure this file is readable by Nagios (to keep it away from prying eyes), but otherwise you’re all set and ready to go!

# Here is what one might do to protect this apprise configuration
chmod 640 /etc/nagios/apprise.yml
chown nagios.root /etc/nagios/apprise.yml

Verification

To be sure everything works, you may want to just test that you got all of your configuration right You can test this using manually as follows:

# Test our configuration with apprise using the PROBLEM tag
# -vvv for some verbose debugging in-case we need it.
apprise -c /etc/nagios/apprise.yml \
    -n CRITICAL -g PROBLEM \
    -vvv \
    -t "A Test Title" \
    -b "a Test Body"

Here is a screenshot of a test error displayed on gitter.im that was sent by Nagios using Apprise:
Gitter Example

Sources

NRPE for Nagios Core on CentOS 7.x

Introduction

In continuation to part 1 and part 2 of this blog series… NRPE (Nagios Remote Plugin Executor) is yet another Client/Server plugin for Nagios (but can work with other applications too). Unlike NRDP, Nagios isn’t required for NRPE to function which means you can harness the power of this tool and apply it to many other applications too. It does however work ‘very’ well with Nagios and was originally designed for it.

If you read my blog on the NRDP protocol, then you’re already familiar with it’s push architecture where the Applications are responsible for reporting in their status. NRPE however works in the reverse fashion; NRPE requires you to pull the information from the Application Server instead. The status checking responsibility falls on the Nagios Server (instead of the Applications it monitors).

NRPE Overview
NRPE Overview

The diagram above illustrates how the paradigm works.

  • The A represents the Application Server. There is no limit to the number of these guys.
  • The N represents the Nagios Server. You’ll only ever need one Nagios server.

NRPE Query Overview

  1. The Nagios Server will make periodic status checks to the to the Application Server via the NRPE Client (check_nrpe).
  2. The Application Server will analyze the request it received and perform the status check (locally).
  3. When the check completes, it will pass the results back to the Nagios Server (via the same connection the NRPE Client started).
  4. Nagios will take the check_nrpe results and display it accordingly. If the check_nrpe tool can’t establish a connection to the NRPE Server (running on the Application Server), then it will report a failure.

    If the call to the check_nrpe tool takes to long to get a response (or a result) back, then it will be reported as a (timeout) failure. By default check_nrpe will wait up to a maximum of 10 seconds before it times out and gives up. You can change this if you find it too short (or to long).

Here is what you’re getting with this blog and the packaged rpms that go with it:

  • Nagios Core v4.x: updated RPM packaging which I continued to maintain and carry forward from my previous blogs.
    Note: You will need this installed in order to monitor applications with NRPE. See part 1 of this blog series for more information if you don’t already have it set up.
  • NRPE (Nagios Remote Plugin Executor) v3.x: My custom RPM packaging bringing NRPE v3.x to CentOS 7.x for the first time especially since I couldn’t find it available anywhere else (at the time of the this blog entry). I had to make a few modifications to it so that it would be easier used our environment such as:
    • I forwarded the useful patches from the old NRPE v2.x branch to the new NRPE 3.x branch that were applicable still.
    • I patched the SystemD startup script to work with CentOS/Red Hat systems.
    • A sudoer’s file is already to go for those who want to use sudo with their NRPE remote calls.
    • There is firewall configuration all ready to use with FirewallD.

    Best of all, with my RPMs, you can run SELinux in full Enforcing mode for that extra piece of mind from a security standpoint!

    The Goods

    For those who really don’t care and want to just jump right in with the product. Here you go!

    You can download the packages manually if you choose, or reference them using my repository:

    Package Download Description
    nrpe el7.rpm The NRPE Server: This should be installed on any server you want to monitor. The server will allow the machine to respond to requests sent to it via the check_nrpe (Nagios) plugin.
    nrpe-selinux el7.rpm An SELinux add-on package that allows the NRDP Server to operate under Enforcing Mode.

    Note: This RPM is not required by the NRDP server to run correctly.
    nagios-plugins-nrpe el7.rpm Our NRPE Client; this is the Nagios Plugin used to request information from the NRPE Server(s). You’ll only need to install this on the Nagios server (for the purpose of this blog). The RPM provides a small tool called check_nrpe that gets installed into the Nagios Plugin Directory (/usr/lib64/nagios/plugins).

    Through some simple configuration; Nagios can use the check_nrpe tool to monitoring anything you want just as long as the NRPE server is (installed and) running at the other end.

    Note: The source rpm can be obtained here which builds everything you see in the table above. It’s not required for the application to run, but might be useful for developers or those who want to inspect how I put the package together.

    NRPE Server Side Configuration

    The NRPE Server is usually installed onto all of the Application servers you want Nagios to monitor remotely. It provides a means of accepting requests to process (such as, what is your system load like?) and handling the request and returning the response back.

    Assuming you hooked up to my repository here, the NRDP server can can be easily installed with the following command:

    # Install NRPE (Server) on your Application Server
    yum install nrpe nrpe-selinux
    

    NRPE communicates through TCP port 5666; which means you may need to enable the ports in your firewall first to allow remote connections from Nagios. The below commands open up the protocol to everyone attached to your network. You should only perform this command if your Application Server will be running on a local private network:

    # Enable NRPE port (5666)
    firewall-cmd --permanent --add-service=nrpe
    firewall-cmd --add-service=nrpe
    

    If you’re opening this up to the internet, then you might want to just open the (NRPE) port exclusively for the Nagios Server. The following presumes you know the IP of the Nagios Server and will open access to ‘JUST’ that system:

    # Assuming 5.6.7.8 belongs to the Nagios Server you intend to
    # allow to monitor you; you might do the following:
    firewall-cmd --permanent --zone=public \
       --add-rich-rule='\
           rule family="ipv4" \
           source address="5.6.7.8/32" \
           port protocol="tcp" \
           port="5666" \
           accept'
    

    You’ll want to have a look at your NRPE Configuration as you will probably have to update a few lines. See /etc/nagios/nrpe.cfg and have a look for the following directives:

    Directive Details
    allowed_hosts This is added security for NRPE but can come back and haunt you if you ignore it (as nothing will work). You should specify the IP address of your Nagios server here and not allow anyone else! For example, if your Nagios server was 6.7.8.9, then you would put 6.7.8.9/32 here.
    dont_blame_nrpe This allows you to pass options into your remote checks. I personally think this is awesome, but there is no question that depending on the checks that accept arguments, it could could exploit content from your system you wouldn’t have otherwise wanted to share. This is specially the case if you grant NRPE Sudoers permission (discussed a bit lower). Set this value to one (1) to enable argument passing and zero (0) to disable it.
    allow_bash_command_substitution Bash substitution is something like $(hostname) (which might return something like node01.myserver). Set this value to one (1) to enable argument passing and zero (0) to disable it. I don’t use personally use this and therefore have it disabled on my system.

    Setting up NRPE Tokens

    Once your server is set up, there is one last thing you need to do. You need to associate tokens with some of the status checks you want to do. You see, NRPE won’t just execute any program you tell it to, it will only execute programs a specific way that you’ve allowed for. This is purely for security and it makes sense to do so! It’s really not that complicated, consider a token/command mapping like this (as an example):

    Token Command
    check_load # Check the system load:
    # – warning if greater than 10.0
    # – critical if grater then 20.0
    # – we map the check_load token to this command:
    /usr/lib64/nagios/plugins/check_load -w 10 -c 20

    You pass this information into NRPE by creating a .conf file in /etc/nrpe.d (assuming you’re using my RPMs) with the syntax:

    command[token]=/path/to/mapped/command

    So with respect to the example we started; it would look like this:

    command[check_load]=/usr/lib64/nagios/plugins/check_load -w 10 -c 20

    You can specify as many tokens as you like (each one has to be unique from the other). If you have a look in /etc/nrpd.d/, you’ll see one called common_checks.cfg which has a handful of useful commands to start you off with. You can add to this file or start another if you like (in the same directory with a .conf extension). Each time you make changes to the configuration (or another) you’ll need to signal NRPE to have it load any new changes you provided:

    # Restart our NRPE Service
    systemctl restart nrpe.service
    

    Here is a conceptual diagram that will help illustrate what was just explained here using the check_load example above:

    NRPE check_load Example
    NRPE check_load Example

    Sudoer’s Permission

    Substitute User Do (or sudo) allows you to run commands as other users. Most commonly people use sudo to elevate there permission to the root (superuser) privilege to execute a command. When the command has completed, they return back to their normal privileges.

    Some actions require you to have higher system authority to get anything good from them. This includes retrieve certain kinds of system/status information. For example, you can’t check how much mail is spooled and ready for remote delivery (on a Mail Server) unless you reduce the privileges of all of your stored mail (making it accessible by anyone); not a very good idea! But alternatively you can use sudo to grant one user permission to just run a command that can only fetch the number of mail items queued; this is a much better approach! Thus elevating a users permissions for just "special status checking only " commands isn’t so much of a risk. NRPE can be configured to have superuser permissions for it’s status checks which can allow you to monitor a lot more things! Here is how to do it:

    # First make sure that the sudoer's configuration doesn't
    # require tty (teletype terminals) endpoints to use. You
    # do so by commenting out the line that reads 'Defaults requiretty'
    # in the /etc/sudoers file (access it with the command visudo)
    # You can also do this with the below one-liner too
    sed -i -e 's/^\(Defaults[ \t]\+requiretty.*\)$/#\1/g' \
             /etc/sudoers
    
    # By doing this, you allow pseudo-teletype (pty) endpoints
    # to use the sudo command too. NRDP would be a pty service
    # for example since it's not an actual person running the
    # command.
    
    # Update our configuration to use the sudo command
    sed -i -e "s|^[ \t#]*\(command_prefix\)=.*|\1=/usr/bin/sudo|g" \
          /etc/nagios/nrpe.cfg
    
    # Restart our server if it's running
    systemctl restart nrpe.service
    

    SELinux users will want to also do this:

    # Allow NRPE/Nagios calls to run /usr/bin/sudo
    setsebool -P nagios_run_sudo on
    

    NRPE Client Side Configuration

    The client side simply consists of a small application (called check_nrpe) which connects remotely to any NRPE Server you tell it to and hands it a token for processing (provided your NRPE Server is configured correctly). The NRPE Server will take this token and execute a command that was associated with it and return the results back to you.

    # You'll want to be connected to my repositories for this to work:
    # See: https://nuxref.com/repo for more information
    # Install NRPE on the same server running Nagios
    yum install nagios-plugins-nrpe
    

    There is configuration already in place for you if you’re using my RPMS located in /etc/nagios/conf.d called nrpe.cfg. But you can feel free to create your own (or over-write it like so:

    cat << _EOF > /etc/nagios/conf.d/nrpe.cfg
    ; simple wrapper to check_nrpe for remote calls to NRPE servers
    define command{
       command_name check_nrpe
       command_line /usr/lib64/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
    }
    
    ; check_nrpe with arguments enabled (enable the dont_blame_nrpe for this
    ; to work properly otherwise simply don't use it)
    define command{
       command_name check_nrpe_args
       command_line /usr/lib64/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -c $ARG1$ -a $ARG2$
    }
    

    You can now send command tokens to servers running NRPE via Nagios. A Nagios configuration might look like this:

    cat << _EOF > /etc/nagios/conf.d/my.application.server01.cfg
    ; first we define a host that we want to monitor.
    ; If you already have a host configured; you can skip this part
    define host{
    ; Name of host template to use. This host definition will inherit
    ; all variables that are defined in (or inherited by) the
    ; linux-server host template definition. You can find this
    ; in /etc/nagios/objects/templates.cfg if you're interested
            use                     linux-server
    
    ; Now we define the server we're going to monitor
            host_name               my-application-server01.nuxref.com
            alias                   my-application-server01.nuxref.com
            address                 192.168.1.2
    
            statusmap_image         redhat.png
            icon_image              redhat.png
            icon_image_alt          CentOS 7.x
    }
    
    ; Now we want to define our monitoring service that talks to our
    ; Application server with NRPE installed on it:
    define service{
    ; Name of service template to use. This service definition will inherit
    ; all variables that are defined in (or inherited by) the
    ; local-service service template definition. You can find this
    ; in /etc/nagios/objects/templates.cfg if you're interested
    	use				local-service
            host_name                       my-application-server01.nuxref.com
    	service_description		Our System Load
    
    ; The Exclamation mark (!) lets nagios know that check_load is the argument
    ; we want to pass into our check_nrpe command we defined earlier (as $ARG1$)
    	check_command			check_nrpe!check_load
    }
    _EOF
    

    If you want to make sure it’s going to work, before you go any further you can run the same command you just told Nagios to do (above):

    # The syntax 'check_nrpe!check_load' gets translated to:
    #  /usr/lib64/nagios/plugins/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
    #
    # Which we can further translate (for testing purposes) to:
    /usr/lib64/nagios/plugins/check_nrpe -H 192.168.1.2 -c check_load
    

    If everything worked okay you should see something similar to the output:

    OK – load average: 0.00, 0.00, 0.00|load1=0.000;10.000;20.000;0; load5=0.000;10.000;20.000;0; load15=0.000;10.000;20.000;0

    You’ll want to reload Nagios to pick up on your new configuration so it can start calling this command too:

    # But before you reload it; it doesn't hurt to just check and make
    # sure your configuration is okay; You can test it out with:
    nagios -v /etc/nagios/nagios.cfg
    
    # correct any errors that display and rerun the above command until
    # everything checks out okay!
    
    # Now reload Nagios so it reads in it's new configuration:
    systemctl reload nagios.service
    

    NRPE vs. NRDP

    Which one should you use? Both NRDP and NRPE have their Pro’s and Cons.

    Benefits of using…
    NRPE NRDP
    You have central control over the checking periods of an application. You only ever need to open 1 TCP port; from a security standpoint; this is awesome!
    You don’t need Nagios to use this. If used properly, it can provide a great way to access remote servers for information and even execute administrative and maintenance commands. You only need to manage the Nagios configuration when a new server is added.
    Can send multiple status messages in one single (TCP) transaction.
    Can reflect a remote application status change immediately oppose to NRPE which one reflect the change until the next status check is performed.

    It should be worth noting that nothing is stopping you from using both NRDP and NRPE at the same time. You might choose an NRDP strategy for remote systems while choosing an NRPE strategy for all your systems residing in your private network. NRPE works over the internet as well, but just exercise caution and be sure to have SELinux running in Enforcing mode on all of the server end points.

    Credit

    This blog took me a very (,very) long time to put together and test! The repository hosting alone accommodates all my blog entries up to this date. All of the custom packaging described here was done by me personally. I took the open source available to me and rebuilt it to make it an easier solution and decided to share it. If you like what you see and wish to copy and paste this HOWTO, please reference back to this blog post at the very least. It’s really all I ask.

    Sources

NRDP for Nagios Core on CentOS 7.x

Introduction

In continuation to part 1 of this blog series… NRDP (Nagios Remote Data Processor) is a Client/Server plugin for Nagios. With the NRDP model, it is the Application (or the server hosting it) that is responsible for reporting in it’s status. The beauty of this is you can have applications installed all throughout your business, your home, across data centers, overseas, etc all reporting to 1 single Nagios instance. You can monitor your entire infrastructure with this tool.

NRDP Overview
NRDP Overview

The diagram above illustrates how the paradigm works.

  • The A represents the Application Server. There is no limit to the number of these guys.
  • The N represents the Nagios Server. You’ll only ever need one Nagios server.
  • The Application (and/or the server hosting it) will make periodic status reports to the Nagios server. It should report that everything is okay, or report that it isn’t. Out of the box (using my rpms) Nagios will wait for up to X minutes (configurable) for a message to be received before reporting the service/host as being offline. Ideally you should send at least 2 or 3 notifications within this grace period. Currently X is set to 5 minutes with the templates in the RPMs I provide. But you can change this.
  • If Nagios doesn’t hear from the Application for an extended period of time, then it assumes the worst and reports to the user that there is a problem.

NRDP is a newer chapter in the Nagios world which allows you to monitor more applications you wouldn’t have otherwise been able to do. NRDP replaces it’s predecessor NSCA which performs the exact same task/function. One of the key differences is that NRDP is much easier to set up and use than NSCA is. NRDP is also much less lines of code too! The RPMs I provide make the setup even easier. That said, if you’re interested in using NSCA instead, I still package it here (and client here). But lets get back to talking about NRDP….

NRDP Clients pushing to Central NRDP Server
NRDP Clients pushing to Central NRDP Server

Here is what you’re getting with this blog and the packaged rpms that go with it:

  • Nagios Core v4.x: updated RPM packaging which I continued to maintain and carry forward from my previous blogs.
    Note: You will need this installed in order to use NRDP. See part 1 of this blog series for more information if you don’t already have it set up.
  • NRDP (Nagios Remote Data Processor) v1.x: My custom RPM packaging to just work for everyone right out of the box. Here are some modifications I made to the package to make our lives easier:
    • The default insecure port is configured to use 5668.
    • The default secure port is configured to use 5669.
    • I included my own Apache configuration so that the service would run right out of the box.
    • There are a several patches I had to make in order for the NRDP package to even work in our CentOS environment.
    • There is firewall configuration all ready to use with FirewallD.

Best of all, with my RPMs, you can run SELinux in full Enforcing mode for that extra piece of mind from a security standpoint!

The Goods

For those who really don’t care and want to just jump right in with the product. Here you go!

You can download the packages manually if you choose, or reference them using my repository:

Package Download Description
nrdp el7.rpm The NRDP Server: This must be installed on the same server running Nagios. It is in charge of listening to requests sent by nrdp-clients and reporting their status directly to Nagios. Don’t forget to change your tokens which we’ll discuss further down if you don’t know what I’m talking about!
nrdp-selinux el7.rpm An SELinux add-on package that allows the NRDP Server to operate under Enforcing Mode.

Note: This RPM is not required by the NRDP server to run correctly.
nrdp-client el7.rpm The NRDP Client: You’ll only need to optionally install this on to the remote servers that will be sending their status along to your NRDP Server. Since the protocol is so simple to adapt, you can also just choose to build the client aspect into the program you want to monitor with. If you do choose to install this RPM, you get access to a small program called send_nrdp.php which allows you to post status messages to the Nagios Server.
nrdp-doc el7.rpm Just some documentation that is already publicly available on NRDP’s website.
Note: This RPM is not required by Nagios to run correctly.

Note: The source rpm can be obtained here which builds everything you see in the table above. It’s not required for the application to run, but might be useful for developers or those who want to inspect how I put the package together.

NRDP Server Side Configuration

The NRDP Server is only going to work if it’s installed on the same server as Nagios.

Assuming you hooked up to my repository here, the NRDP server can can be easily installed with the following command:

# Install NRDP (Server) on the same server as the one running Nagios
yum install nrdp nrdp-selinux

NRDP was written in PHP and therefore depends on a few small tweaks on your part. For one, you’ll want to make sure you have a timezone configured too to silence any potential errors and keep the reporting consistent. You’ll want to do this from your /etc/php.ini file; look for a directive called date.timezone. Here is a list of supported timezones by PHP. But you can also just do this which works too:

# This clever trick (a 1 liner) lets the timezone in the /etc/php.ini
# to that of whatever your current system is set to!
sed -i -e "s/^[; \t]\?\(date\.timezone\)[ \t]*=.*/\1 = \"$(date +%Z)\"/g" \
   /etc/php.ini

If Apache isn’t running; now’s the time to start it up.

# This simple command should be ran as root and will
# start Apache if it isn't running already, otherwise it
# will send a reload signal if it is.  Regardless,
# after you finish this command, the new NRDP Configuration
# will be running and ready to go.
systemctl status httpd.service && \
   systemctl reload httpd.service || \
   systemctl start httpd.service

# Make sure we start up on reboots (if we haven't done so already)
systemctl enable httpd.service

NRDP/Nagios Configuration

You need to let Nagios know what services it should expect notifications from. The key thing that separates these entries from any other Nagios entries you specified is the ‘use‘ directive. For hosts reporting in, you’ll want to use the generic-nrdp-host directive. For services reporting in, you’ll want to use the generic-nrdp-service entry.

A Configuration file on your Nagios Server (the same server running your NRDP Server) might look like this:

cat << _EOF > /etc/nagios/conf.d/somehost.cfg
# We might have a test server defined like this
define host {
    # Name of host template to use (notice we're using the
    # 'generic-nrdp-host' template).  If you want to actively ping
    # this server and not await to hear from it, you can change this
    # to use the 'linux-server' template instead.
    use                         generic-nrdp-host
    # Hostname
    host_name                   somehost
    # Human Readable
    alias                       My Test Server
}

# Now we might identify a component to associate with our test server called <em>somehost</em>.
define service{
   use                 generic-nrdp-service
   service_description someservice
   host_name           somehost
}
_EOF

With the templates I provide, Nagios will wait for up to 5 minutes for a notification before assuming nothing is coming (and changing the state to CRITICAL). If this is too long (or too short) of a wait time for you, you can change it by modifying the NRDP templates (or creating your own). The provided NRDP templates can be found here: /etc/nagios/conf.d/nrdp.conf. You’ll want to focus on the entry entitled freshness_threshold which is currently set to 300 (300 seconds is equal to 5 minutes) and change it to your liking.

Alternatively, you can over-ride the template value by just adding the freshness_threshold to the service configuration you create like so:

cat << _EOF > /etc/nagios/conf.d/somehost.10min.service.cfg
# Over-ride our template timeout with a new one specified
# in our service declaration
define service{
   use                 generic-nrdp-service
   service_description someservice
   host_name           somehost
   # Over-ride the defaults in our template and only go stale
   # if 10 minutes elapses (600 seconds = 10 min)
   freshness_threshold 600
}
_EOF

If you’ve modified your configuration in anyway; be sure to reload Nagios so that it can take in our new configuration:

# Reload our service (if it's running)
systemctl reload nagios.service

# Be sure to start it if it isn't already:
systemctl start nagios.service

Testing your NRDP Server Setup

Once NRDP is up and running it will provide you a simple website you can use to interact with it. This interface allows you to manually send passive checks through to Nagios which is really useful for testing if everything is working okay!

Here is what the test page looks like:

NRDP Server Side
NRDP Server Side – use nuxref as a token to test with.

Note: The blog does not touch on the Submit Nagios Command portion of this page; but that’s another topic for another day. You want to focus on the Submit Check Data section.

To use NRDP you will be required to provide a token with every submission you ever make to it. This is just a security feature to prevent people from sending messages into your system who shouldn’t be. By default (if you’re using my rpms), the token is nuxref. You’ll notice in the above example, I’ve gone ahead and filed this field in to show you where it goes.

To change this token (which you really should do) just open up the php file /etc/nrdp/config.php. You can specify as many tokens as you like that you wish to accept. Here is what you’re looking for:

// ...
// look for the authorized_tokens directive
// and add your tokens here:
$cfg['authorized_tokens'] = array(
   "the-longer-and-more-encrypted-your-token-is-the-better",
   "specify-as-many-tokens-as.-you-want",
);

If you followed the blog so far and created the test examples I provided (above), then you can now submit a passive check using the default settings; don’t forget to provide your token!

If everything goes according to plan, you should be able to check out your message safely being acknowledged:

NRDP Test Submission Results (using defaults)
NRDP Test Submission Results (using defaults)

NRDP Server Security

Since you can access a very simple NRDP webpage on the same server hosting Nagios via http://localhost:5668/ (insecure) or https://localhost:5669/ (secure), you’ll want to protect it from those who shouldn’t be visiting it.

The below commands open up the protocol to everyone on your network. You should only perform this command if your Nagios Server will be running on a local private network:

# Enable insecure NRDP port (5668)
firewall-cmd --permanent --add-service=nrdp
firewall-cmd --add-service=nrdp

# Enable secure NRDP port (5669)
firewall-cmd --permanent --add-service=nrdps
firewall-cmd --add-service=nrdps

If you’re opening this up to the internet, then you might want to just open the (NRDP) port(s) exclusively used by the remote applications you intend to allow reporting from. It might be worth investing time into fail2ban as well as an additional precaution. The following presumes you know the IP of the application server and will open access to ‘JUST’ that system:

# Assuming 1.2.3.4 belongs to a system you trust who will be sending you
# Nagios status reports via NRDP:
firewall-cmd --permanent --zone=public \
   --add-rich-rule='\
       rule family="ipv4" \
       source address="1.2.3.4/32" \
       port protocol="tcp" \
       port="5668" \
       accept'

# Here is the Secure version of the same command
firewall-cmd --permanent --zone=public \
   --add-rich-rule='\
       rule family="ipv4" \
       source address="1.2.3.4/32" \
       port protocol="tcp" \
       port="5669" \
       accept'

Note: Again, I want to stress you should only open ports you intend to use and to the Application Servers who intend to use it. If you don’t intend on using the insecure port, then don’t even open it on your firewall!

Once you’re satisfied that everything is working. You might want to disable GET requests on your NRDP Server. This will prevent people from using the test website and only accept POST requests. To do this, just have a look in /etc/httpd/conf.d/nrdp.conf and uncomment (hence, remove the #) the line that reads:

RewriteCond %{REQUEST_METHOD} !^(POST) [NC]

You’ll need to reload your Apache server for the changes to take effect. If ever you want to test again, just comment the line back out again:

# This simple command should be ran as root and will
# start Apache if it isn't running already, otherwise it
# will send a reload signal if it is.  Regardless,
# after you finish this command, the new NRDP Configuration
# will be running and ready to go.
systemctl status httpd.service && \
   systemctl reload httpd.service || \
   systemctl start httpd.service

# Make sure we start up on reboots (if we haven't done so already)
systemctl enable httpd.service

NRDP Client Side Configuration

The protocol is so simple, that you can easily adapt this into programs you write or tool-sets. You can also just simple use the tool provided

# Install Nagios Core using NuxRef
# See: https://nuxref.com/repo for more information
# Install NRDP Client (CLI) on any server you want to
# be able to report it's status to (onto Nagios)
yum install nrdp-client

Now… from our application server we would want to install our nrdp-client package. so that we can access our send_nrdp.php tool.

The tool is very simple to use; here is an example how you can send a simple status message to our Nagios server:

# The below sends a host acknowledgment to our NRDP server
# url:   web address to our Nagios server
# token: the security token (this is to prevent anyone from
#        sending status requests to your Nagios server.
#        The default is 'nuxref' until you change it (which you
#        really should consider doing)
# state: This will work the color coding of Nagios, you can
#        specify one of the following:
#           0 - OKAY (green)
#           1 - WARNING (yellow/orange)
#           2 - CRITICAL (red)
#           3 - UNKNOWN (blue)
# output: Attach a message with our state; this will be visible
#         as well from the Nagios display screen:

# Here is a NRDP notification:
send_nrdp.php --url=http://nagios.server.addr:5668 --token=nuxref \
   --host=somehost \
   --state=0 \
   --output="Test is is working great."

# This will send a warning status to our component we identified
send_nrdp.php --url=http://nagios.server.addr:5668 --token=nuxref \
   --host=somehost \
   --state=0 \
   --output="Test is is working great."

Note: It’s worth noting that there are no WARNING states for hosts. A zero (0) will report that it is online, and anything else will report it as CRITICAL.

NRDP Protocol

Are you a developer? If you are, you may want to avoid using the script and just talk directly to the NRDP Server from your application. It’s incredibly easy to do. If you’re not, don’t worry; the send_nrdp.php script looks after all of this for you. You can still read this section if you’re interested none the less.

NRDP is literally just a small PHP website that relays XML it receives via it’s restful API into a language that Nagios can interpret. In fact NRDP is just a website that writes specially formatted files into the /var/nagios/spool/checkresults/ directory to which Nagios scans for processing.

The NRDP Payload

The payload is XML; the schema looks like this:

<?xml version='1.0'?> 
<!-- For checking a host -->
<checkresults>
    <!-- The type can be either 'host' or 'service' -->
    <!-- The checktype is always 1 for relaying passive status updates -->
  <checkresult type='host' checktype='1'>
    <hostname>somehost</hostname>
    <!-- 0 = Success, 1 = Warning, 2 = Critical -->
    <state>0</state>
    <!-- A message we might want to associate  -->
    <output>Our somehost is behaving perfectly!</output>
  </checkresult>
</checkresults>

Easy peasy right? A Service message looks like this:

<?xml version='1.0'?> 
<!-- For checking a service -->
<checkresults>
    <!-- The type can be either 'host' or 'service' -->
    <!-- The checktype is always 1 for relaying passive status updates -->
  <checkresult type='service' checktype='1'>
    <!-- This has to be the hostname associated with the service -->
    <hostname>somehost</hostname>
    <!-- This has to be the service we configured using our NRDP
         template.  It's identified by the `service_description`
         field as part of our nagios configuration. -->
    <servicename>someservice</servicename>
    <!-- 0 = Success, 1 = Warning, 2 = Critical -->
    <state>2</state>
    <!-- A message we might want to associate  -->
    <output>Danger Will Robinson! Danger!</output>
  </checkresult>
</checkresults>

You can stack as many messages as you want in a single payload which really adds to NRDPs flexibility:

<?xml version='1.0'?> 
<!-- A mix of status updates -->
<checkresults>
  <checkresult type='host' checktype='1'>
    <hostname>somehost</hostname>
    <state>0</state>
    <output>Our testserver is behaving perfectly!</output>
  </checkresult>

  <checkresult type='service' checktype='1'>
    <hostname>somehost</hostname>
    <servicename>someservice</servicename>
    <state>2</state>
    <output>Houston, we have a problem.</output>
  </checkresult>

  <checkresult type='service' checktype='1'>
    <hostname>somehost</hostname>
    <servicename>my-other-service</servicename>
    <state>1</state>
    <output>Let's hope the problem goes away.</output>
  </checkresult>

</checkresults>

The Transaction

Since it’s a PHP website, it’s hosted via Apache, NginX, etc. This blog sets up Apache but nothing is stopping you from using another service.

This means however that all interactions are web requests. A transaction must be an URL encoded HTTP POST and it must include a special token as part of its payload (part of the verification process). If the token specified is invalid or missing, then the NRDP server will just reject the message. The same rules apply if you don’t POST the message.

You could accomplish this with Python like so:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# Site: https://nuxref.com
# Desc: A simply python snip-it to show you how easy it is
#       to interface with NRDP server:
#
# Note: 'requests' (which is required) can be installed 
#       like so (if you don't already have it):
#  #> yum install python-requests
import requests

xml = """<?xml version='1.0'?> 
<checkresults>
  <checkresult type='service' checktype='0'>
    <hostname>somehost</hostname>
    <servicename>someservice</servicename>
    <state>0</state>
    <output>Everything is running great!</output>
  </checkresult>
</checkresults>"""

params = {
   'token': 'nuxref',
   'cmd': 'submitcheck',
   'XMLDATA': xml,
}
r = requests.post(
    'https://nagios.server.addr:5669',
    params=params,
    headers={'Content-Type': 'application/xml'},
    # This is only in place for those who have self signed secure
    # certificates for private networks; you may wish to change
    # this to True in a real environment if you have the ability
    # to verify the host
    verify=False,
)

# Ideally you want to see: 200
print(r.status_code)

# Print our data
print(r.text)

Credit

This blog took me a very (,very) long time to put together and test! The repository hosting alone accommodates all my blog entries up to this date. All of the custom packaging described here was done by me personally. I took the open source available to me and rebuilt it to make it an easier solution and decided to share it.

NRDP certainly does not work like this out of the box; here were the patches I created (see pull request #12 on NRDPs Repository for upstream pushes of this):

  1. nrdp-php_headerfix.patch: Fix PHP header so tool can be executed on the shell.
  2. nrdp-basic_authorized_token.patch:Default (nuxref) token for an out-of-the-box working solution.
  3. nrdp-date.timezone.warning.patch: Silence potential Timezone Warning Message.
  4. nrdp-permissions.patch: Better passive check permission handling for NRDP/Nagios
  5. nrdp-client-support_ports.patch: Support more then just port http 80 (as this blog sets up our NRDP server using ports 5669 and/or 5668).

If you like what you see and wish to copy and paste this HOWTO, please reference back to this blog post at the very least. It’s really all I ask.

Sources