[torqueusers] Maui and Torque not agreeing on jobstate

Philip Peartree P.Peartree at postgrad.manchester.ac.uk
Wed Sep 24 07:40:49 MDT 2008


Hi All,

Apologies for cross posting, but I wasn't sure where it would fit  
best. I have a problem that a job is stuck in the queued state in  
torque, i.e. qstat shows the state as queued, but maui says the job is  
active, i.e. showq lists it as active and running. In showq the job  
runtime is not counting down and the job is definitely not running on  
any of the nodes it is supposed to.  Diagnose -j says:

Name                  State Par Proc QOS     WCLimit R  Min     User    
  Group  Account  QueuedTime  Network  Opsys   Arch    Mem   Disk   
Procs       Class Features

21                  Running DEF  144 DEF    00:02:00 1  144 mcdiypp2    
   nmrc        -    00:28:33   [NONE] [NONE] [NONE]    >=0    >=0     
NC0 [short_2h:1] [NONE]

And qstat -f says

Job Id: 21.steel.mib.man.ac.uk
     Job_Name = qsubtest.com
     Job_Owner = mcdiypp2 at steel.mib.man.ac.uk
     job_state = Q
     queue = short_2h
     server = steel.mib.man.ac.uk
     Checkpoint = u
     ctime = Wed Sep 24 14:08:27 2008
     Error_Path = steel.mib.man.ac.uk:/home/mcdiypp2/qsubtest.com.e21
     Hold_Types = n
     Join_Path = n
     Keep_Files = n
     Mail_Points = a
     mtime = Wed Sep 24 14:37:54 2008
     Output_Path = steel.mib.man.ac.uk:/home/mcdiypp2/qsubtest.com.o21
     Priority = 0
     qtime = Wed Sep 24 14:08:27 2008
     Rerunable = True
     Resource_List.nodect = 18
     Resource_List.nodes = 18:ppn=8
     Resource_List.walltime = 00:02:00
     Variable_List = PBS_O_HOME=/home/mcdiypp2,PBS_O_LANG=en_GB.UTF-8,
         PBS_O_LOGNAME=mcdiypp2,
         PBS_O_PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/b
         in:/usr/games:/opt/torque/2.3.3/bin:/opt/torque/2.3.3/sbin:/opt/maui/3
         .2.6p19/bin:/opt/maui/3.2.6p19/sbin:/opt/openmpi/1.2.6/bin,
         PBS_O_MAIL=/var/mail/mcdiypp2,PBS_O_SHELL=/bin/bash,
         PBS_SERVER=steel.mib.man.ac.uk,PBS_O_HOST=steel.mib.man.ac.uk,
         PBS_O_WORKDIR=/home/mcdiypp2,PBS_O_QUEUE=route
     etime = Wed Sep 24 14:08:27 2008
     exit_status = -3
     submit_args = qsubtest.com
     start_time = Wed Sep 24 14:08:28 2008
     start_count = 1756


I don't understand the priority being zero, as maui lists the  
startpriority as 60. Something appears to be not communicating  
somewhere. Could someone shed some light on it?

Philip Peartree



More information about the torqueusers mailing list