[torqueusers] Maui does not know queue to node map? - queue system is failing, please HELP !

Milind gadre at wisc.edu
Fri Feb 3 12:13:32 MST 2012


Hello,

I am a cluster administrator at  the University of Wisconsin-Madison. At our cluster we have Maui (3.2.5), PBS 2.4.6 on the ROCKS 5.3 system.  (sorry I wrote OpenPBS last email) 

For
 last few days, our queue system has been haywire : the PBS accepts jobs
 and puts them in right queues, but the scheduler somehow does something in the middle, and the job ends up on a 'wrong' compute node (which is 
not supposed to be in that queue), all the while PBS still lists that 
job as running under the right queue. 

example, PBS shows this: 

Job id                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
60606.bardeen             Cu1_a60_mov      <user>         00:52:05 R fast      

but
 the job is on a compute node which is not at all in the queue "fast" ! 
The pbs nodelist (/opt/torque/server_priv/nodes ) is all fine, no errors
 in maui logs.  In pbs logs, I get this message 

 10:54:19;0008;PBS_Server;Job;60606.bardeen.msae.wisc.edu;Job Modified at request of maui at bardeen.msae.wisc.edu

My guess is that maui is doing something wrong / it does not know the correct queue - to - node mapping. 

Can someone suggest what is going on or guide me to solve this issue ??

thanks !!

Milind


More information about the torqueusers mailing list