[torqueusers] node marked falsely down

Garrick Staples garrick at usc.edu
Wed Nov 1 08:09:04 MST 2006


On Wed, Nov 01, 2006 at 01:33:26PM +0100, Julian Hagenauer alleged:
> Hi,
> i have a host running torque server and scheduler and a node running torque mom.
> The node is shown as up as it should, but after some time, if the node got offline severall times, the node stays permanently marked as offline by pbsnodes, although tcpdump shows that UDP-packets are send from node to host (size 329) and from host to node (size 26).
> What may be wrong?
> 
> Can anybody explain, how exactly healthcheck is done? The problem is, that i want the hiost to rekognize if the node is free as soon as possible.
> 

Nothing in TORQUE marks nodes "offline".  That is only done by admins.
Maybe you have a cronjob somewhere marking nodes offline?

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20061101/a33adf2f/attachment.bin


More information about the torqueusers mailing list