[torqueusers] Nagios Plugins for Torque
Jay Srinivasan
jay at nersc.gov
Thu Apr 19 16:21:32 MDT 2007
Chris Vaughan wrote:
> Hi,
>
> I was wondering if anyone has developed any Nagios plugins for Torque?
> If so would you mind sharing?
>
> I'm looking for something to monitor the moms and the server. Can
> anyone think of any other services related to torque that would be good
> to monitor?
>
> Thanks,
>
Here is a little C program that checks the status of the server. You can
run it as a Nagios plugin directly, i.e. in services.cfg use
define service{
use generic-service
...
service_description PBSSERVER
check_command chk_pbsserv
}
and in checkcommands.cfg:
# 'chk_pbsserv' command definition
define command{
command_name chk_pbsserv
command_line $USER1$/chk_pbsserv
}
and run it as a servicecheck for any one node you choose.
/* chk_pbsserv.c
cc chk_pbserv.c /path/to/libtorque.a -o chk_pbsserv \
-I /path/to/torque/include
*/
#include <pbs_error.h>
#include <pbs_ifl.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
main(int argc, char **argv)
{
int servid;
servid = pbs_connect(NULL);
if (servid <= 0) {
printf("Error: Could not connect to pbs server:
%d\n",pbs_errno);
exit(2);
}
printf("OK: PBS Server\n");
pbs_disconnect(servid);
exit(0);
}
To check MOMs you can periodically run pbsnodes -l and read parse that
to see if any MOMs are listed as being down -- that would be as a
service check for each node that the MOM is running on.
Jay
More information about the torqueusers
mailing list