[torqueusers] RE: Nagios Plugins for Torque (Chris Vaughan)

Chris Vaughan chris at clusterresources.com
Tue Apr 17 05:02:08 MDT 2007


Brian,

Thanks for your reply.

Brian_Gupta at timeinc.com wrote:
> I would defiantly monitor whether individual queues are working. 
>
> I feel that the following should also be monitored and graphed over
> time: CPU/node load distribution, job run times, memory usage, and
> network IO. If you are using a Cluster file system, I would also monitor
> IOWait, and FS usage. 
>
> These and other metrics are useful for cluster capacity planning. 
>
> One thing I am wondering. Why don't you use Moab? It integrate with
> Torque/OpenPBS and provides monitoring and reporting  (among other
> things), and your company sells it.
> http://www.clusterresources.com/pages/products/moab-cluster-suite.php
>
> -Brian
>   
We do use Moab to keep track of Torque and node status.  Where Nagios 
comes into play is in what's causing the errors.  Did a network switch 
go down?  Did dhcp crash/node reboot and get the wrong ip?  These can be 
coded into Moab using its native interface but why not use the existing 
tools that come with Nagios? 
> P.S. - I don't work with Torque anymore, but if you are interested, I
> can setup a test environment at home, and work with you on getting
> Nagios/Torque integration working. I've been meaning to test some HA
> ideas I had, but since I don't work with it anymore, and time has been
> scarce, it fell by the wayside.)
>   
I don't think it should be too hard to get simple stuff from torque.  
For Moab deployments we usually use just one queue in Torque and have 
Moab govern the usage of that queue, it provides a lot more flexibility. 


> -----Original Message-----
> From: Chris Vaughan <chris at clusterresources.com>
> Subject: [torqueusers] Nagios Plugins for Torque
>
> Hi,
>
> I was wondering if anyone has developed any Nagios plugins for Torque?  
> If so would you mind sharing?
>
> I'm looking for something to monitor the moms and the server.  Can 
> anyone think of any other services related to torque that would be good 
> to monitor?
>
> Thanks,
>
>   


-- 
Chris Vaughan
EMEA Systems Engineer
Cluster Resources, Ltd.
Direct - UK Office:  +44 (0)1223 437 132
US Headquarters:  +1 801 717 3700
Skype: supercomputer1
www.clusterresources.co.uk



More information about the torqueusers mailing list