[torqueusers] Maui does not know queue to node map? - queue system is failing, please HELP !
Milind
gadre at wisc.edu
Fri Feb 3 10:57:06 MST 2012
Hello,
I am a cluster administrator at
the University of Wisconsin-Madison. At our cluster we have Maui (3.2.5), OpenPBS 2.3 on the ROCKS 5.3 system.
For last few days, our queue system has been haywire : the PBS accepts jobs and puts them in right queues, but the scheduler somehow does something in the middle, and the job ends up on a 'wrong' compute node (which is not supposed to be in that queue), all the while PBS still lists that job as running under the right queue.
example, PBS shows this:
Job id Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
60606.bardeen Cu1_a60_mov <user> 00:52:05 R fast
but the job is on a compute node which is not at all in the queue "fast" ! The pbs nodelist (/opt/torque/server_priv/nodes ) is all fine, no errors in maui logs.
In pbs logs, I get this message
10:54:19;0008;PBS_Server;Job;60606.bardeen.msae.wisc.edu;Job Modified at request of maui at bardeen.msae.wisc.edu
My guess is that maui is doing something wrong / it does not know the correct queue - to - node mapping.
Can someone suggest what is going on or guide me to solve this issue ??
thanks !!
Milind
More information about the torqueusers
mailing list