Categories

Sunday, 29 September 2013

Nagios Event Handler configuration

Hi,

 Event Handler is one of the configuration options on the Nagios server monitoring tools to take an action if a state has been changed on a particular service or host. An obvious use for event handlers (especially with services) is the ability for Nagios to proactively fix problems before anyone is notified.

Types of event Handlers
----------------------------------------

There are two main types of event handlers than can be defined - service event handlers and host event handlers. Event handler commands are (optionally) defined in each host and service definition. Because these event handlers are only associated with particular services or hosts, I will call these "local" event handlers. If a local event handler has been defined for a service or host, it will be executed when that host or service changes state.

Also there is global event handlers which are executed before the service and host event handlers. This will run for every host and services configured on the nagios.

You can enable the the global event handler by adding options   global_host_event_handler and global_service_event_handler main configuration file of nagios.

When Are Event Handler Commands Executed?

Service and host event handler commands are executed when a service or host:
  • is in a "soft" error state
  • initially goes into a "hard" error state
  • recovers from a "soft" or "hard" error state

Enabling the event handler on the nagios

For enabling the event handler on the nagios. First you need to add the option
enable_event_handler = 1 on the main configuration file /usr/local/nagios/etc/nagios.cfg.\



When Are Event Handler Commands Executed?
Service and host event handler commands are executed when a service or host:
  • is in a "soft" error state
  • initially goes into a "hard" error state
  • recovers from a "soft" or "hard" error state
    Soft States
    Soft states occur for services and hosts in the following situations...
  • When a service or host check results in a non-OK state and it has not yet been (re)checked the number of times specified by the <max_check_attempts> option in the service or host definition. Let's call this a soft error state...
  • When a service or host recovers from a soft error state. This is considered to be a soft recovery. 

Hard States
Hard states occur for services in the following situations (hard host states are discussed later)...
  • When a service check results in a non-OK state and it has been (re)checked the number of times specified by the <max_check_attempts> option in the service definition. This is a hard error state.
  • When a service recovers from a hard error state. This is considered to be a hard recovery.
  • When a service check results in a non-OK state and its corresponding host is either DOWN or UNREACHABLE. This is an exception to the general monitoring logic, but makes perfect sense. If the host isn't up why should we try and recheck the service? 


Writing Event Handler Commands
Event handler commands will likely be shell or perl scripts, but they can be any type of executable that can run from a command prompt. At a minimum, the scripts should take the following macros as arguments:
For Services: $SERVICESTATE$, $SERVICESTATETYPE$, $SERVICEATTEMPT$
For Hosts: $HOSTSTATE$, $HOSTSTATETYPE$, $HOSTATTEMPT$
The scripts should examine the values of the arguments passed to it and take any necessary action based upon those values.

SERVICE STATE cane be of four types

 1. OK
2. Warning
3.Unknown
4.Critical

SERVICESTATETYPE is of two types

1.Soft
2.Hard

SERVICEATTEMPT

How many times the attemts are made based on max_check_attempts defined in the configuration file.

Here I am writing a script to change the dns entries once httpd proccess goes down on a server. I will switch the dns to other httpd server which is running the same data. This is done by editing the zone file on the dns server.

Below the script for that

#!/bin/sh

#

# Event handler script for restarting the web server on the local machine

#

# Note: This script will only restart the web server if the service is

#       retried 3 times (in a "soft" state) or if the web service somehow

#       manages to fall into a "hard" error state.

#





# What state is the HTTP service in?

case "$1" in

OK)

    # The service just came back up, so don't do anything...

    ;;

WARNING)

    # We don't really care about warning states, since the service is probably still running...

    ;;

UNKNOWN)

    # We don't know what might be causing an unknown error, so don't do anything...

    ;;

CRITICAL)

    # Aha!  The HTTP service appears to have a problem - perhaps we should restart the server...



    # Is this a "soft" or a "hard" state?

    case "$2" in

       

    # We're in a "soft" state, meaning that Nagios is in the middle of retrying the

    # check before it turns into a "hard" state and contacts get notified...

    SOFT)

           

        # What check attempt are we on?  We don't want to restart the web server on the first

        # check, because it may just be a fluke!

        case "$3" in

               

        # Wait until the check has been tried 3 times before restarting the web server.

        # If the check fails on the 4th time (after we restart the web server), the state

        # type will turn to "hard" and contacts will be notified of the problem.

        # Hopefully this will restart the web server successfully, so the 4th check will

        # result in a "soft" recovery.  If that happens no one gets notified because we

        # fixed the problem!

        3)

            echo -n "Going to edit the dns zone file in the DNS server)(3rd soft critical state)..."

            # Below command will ssh into the DNS server and will change the dns entries.

            ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no root@192.168.6.209 < /usr/local/nagios/libexec/eventhandlers/script_dns.sh

            ;;

            esac

        ;;

               

    # The HTTP service somehow managed to turn into a hard error without getting fixed.

    # It should have been restarted by the code above, but for some reason it didn't.

    # Let's give it one last try, shall we? 

    # Note: Contacts have already been notified of a problem with the service at this

    # point (unless you disabled notifications for this service)

    HARD)

        echo -n "Going to edit the dns zone file in the DNS server)(3rd soft critical state)..."

        # Below command will ssh into the DNS server and will change the dns entries.

        ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no root@192.168.6.209 < /usr/local/nagios/libexec/eventhandlers/script_dns.sh

        ;;

    esac

    ;;

esac

exit 0
-----------------------

From the above it will ssh into the dns server and then execute the commands given in the file  /usr/local/nagios/libexec/eventhandlers/script_dns.sh

  cat /usr/local/nagios/libexec/eventhandlers/script_dns.sh

/etc/init.d/named stop
cp /var/named/test.com /var/named/test.com.bk
sed -i 's/192.168.6.209/192.168.6.208/' /var/named/test.com
/etc/init.d/named restart

Once the script is ready you should need to define it in the configuration file where commands are defined

/usr/local/nagios/etc/objects/commands.cfg

 define command{

        command_name    dns-edit

        command_line    /usr/local/nagios/libexec/eventhandlers/dns-edit.sh $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$

        }

Also you need to define the script as the event handler so that it will be executed when event handler is called

 define service{
        use                             generic-service          ; Name of service template to use
        host_name                       webtest1
        service_description             HTTP
    check_command            check_http
    notifications_enabled        0
    max_check_attempts        4
    event_handler                   dns-edit
  }

Above one should be added on the configuration file of host.


Once every thing is configured correctly then the script will execute when a service is stopped.


Regards
Syamkumar,M










No comments:

Post a Comment

Ad