[torqueusers] Nagios Plugins for Torque

Jay Srinivasan jay at nersc.gov
Thu Apr 19 16:21:32 MDT 2007


Chris Vaughan wrote:
> Hi,
> 
> I was wondering if anyone has developed any Nagios plugins for Torque? 
> If so would you mind sharing?
> 
> I'm looking for something to monitor the moms and the server.  Can
> anyone think of any other services related to torque that would be good
> to monitor?
> 
> Thanks,
> 

Here is a little C program that checks the status of the server. You can
run it as a Nagios plugin directly, i.e. in services.cfg use

define service{
        use                             generic-service
...
        service_description             PBSSERVER
        check_command                   chk_pbsserv
        }

and in checkcommands.cfg:

# 'chk_pbsserv' command definition
define command{
        command_name    chk_pbsserv
        command_line    $USER1$/chk_pbsserv
}

and run it as a servicecheck for any one node you choose.

/* chk_pbsserv.c

cc chk_pbserv.c /path/to/libtorque.a -o chk_pbsserv \
                -I /path/to/torque/include
*/

#include <pbs_error.h>
#include <pbs_ifl.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>

main(int argc, char **argv)
{
        int servid;

        servid = pbs_connect(NULL);

        if (servid <= 0) {
                printf("Error: Could not connect to pbs server:
%d\n",pbs_errno);
                exit(2);
        }
        printf("OK: PBS Server\n");
        pbs_disconnect(servid);
        exit(0);
}

To check MOMs you can periodically run pbsnodes -l and read parse that
to see if any MOMs are listed as being down -- that would be as a
service check for each node that the MOM is running on.

Jay


More information about the torqueusers mailing list