[torqueusers] nodes switching back to state down
H.Schulz at fz-rossendorf.de
Thu Jan 12 01:55:09 MST 2006
I recently installed TORQUE v2.0.0p4. Now I have the problem that some
nodes (not all) are switching back to state down after setting them to
free with qmgr. This happens after a very short time (1-2 minutes).
During this time one can submit short jobs and these jobs are executed.
On the nodes the pbs_mom is running. Restarting pbs_mom or rebooting the
machine does not help.
pbs_server log gives the following:
01/12/2006 09:49:15;0004;PBS_Server;node;cn49;attributes set: at
request of schulzh at ...
01/12/2006 09:49:15;0004;PBS_Server;node;cn49;node cn49 state changed
from down to free
01/12/2006 09:49:15;0004;PBS_Server;node;cn49;attributes set: state =
01/12/2006 09:50:29;0004;PBS_Server;Svr;check_nodes;node cn49 not
detected in 58830 seconds, marking node down
01/12/2006 09:50:29;0040;PBS_Server;Req;update_node_state;node cn49
What is the problem here?
More information about the torqueusers