[torqueusers] Maui does not know queue to node map? - queue system is failing, please HELP !

Milind gadre at wisc.edu
Fri Feb 3 10:57:06 MST 2012


I am a cluster administrator at 
the University of Wisconsin-Madison. At our cluster we have Maui (3.2.5), OpenPBS 2.3 on the ROCKS 5.3 system. 
For last few days, our queue system has been haywire : the PBS accepts jobs and puts them in right queues, but the scheduler somehow does something in the middle, and the job ends up on a 'wrong' compute node (which is not supposed to be in that queue), all the while PBS still lists that job as running under the right queue. 

example, PBS shows this: 

Job id                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
60606.bardeen             Cu1_a60_mov      <user>         00:52:05 R fast      

but the job is on a compute node which is not at all in the queue "fast" ! The pbs nodelist (/opt/torque/server_priv/nodes ) is all fine, no errors in maui logs. 
In pbs logs, I get this message 

 10:54:19;0008;PBS_Server;Job;60606.bardeen.msae.wisc.edu;Job Modified at request of maui at bardeen.msae.wisc.edu

My guess is that maui is doing something wrong / it does not know the correct queue - to - node mapping. 

Can someone suggest what is going on or guide me to solve this issue ??

thanks !!


More information about the torqueusers mailing list