[torqueusers] Maui does not know queue to node map? - queue system is failing, please HELP !
Milind
gadre at wisc.edu
Fri Feb 3 12:13:32 MST 2012
Hello,
I am a cluster administrator at the University of Wisconsin-Madison. At our cluster we have Maui (3.2.5), PBS 2.4.6 on the ROCKS 5.3 system. (sorry I wrote OpenPBS last email)
For
last few days, our queue system has been haywire : the PBS accepts jobs
and puts them in right queues, but the scheduler somehow does something in the middle, and the job ends up on a 'wrong' compute node (which is
not supposed to be in that queue), all the while PBS still lists that
job as running under the right queue.
example, PBS shows this:
Job id Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
60606.bardeen Cu1_a60_mov <user> 00:52:05 R fast
but
the job is on a compute node which is not at all in the queue "fast" !
The pbs nodelist (/opt/torque/server_priv/nodes ) is all fine, no errors
in maui logs. In pbs logs, I get this message
10:54:19;0008;PBS_Server;Job;60606.bardeen.msae.wisc.edu;Job Modified at request of maui at bardeen.msae.wisc.edu
My guess is that maui is doing something wrong / it does not know the correct queue - to - node mapping.
Can someone suggest what is going on or guide me to solve this issue ??
thanks !!
Milind
More information about the torqueusers
mailing list