[torqueusers] jobs stuck in queue

Azher Mughal azher at hep.caltech.edu
Fri Aug 31 13:23:39 MDT 2012


Hi all,

I have jobs stucked in the queue. One of the sample job and related node 
output is below.

Server is 2.3.7 with maui.

Any help ?

Thanks
-Azher

[root at omega server_priv]# checkjob -v 1621827.omega


checking job 1621827 (RM job '1621827.omega.cluster.hep.caltech.edu')

State: Idle
Creds:  user:bays  group:minos  class:minos  qos:DEFAULT
WallTime: 00:00:00 of 10:00:00:00
SubmitTime: Mon Aug 27 16:31:10
   (Time Queued  Total: 3:19:49:56  Eligible: 00:00:00)

Total Tasks: 1

Req[0]  TaskCount: 1  Partition: ALL
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
Exec:  ''  ExecSize: 0  ImageSize: 0
Dedicated Resources Per Task: PROCS: 1  MEM: 1024M
NodeAccess: SHARED
NodeCount: 0

IWD: [NONE]  Executable:  [NONE]
Bypass: 0  StartCount: 0
PartitionMask: [ALL]
SystemQueueTime: Fri Aug 31 09:12:14

Flags:       HOSTLIST RESTARTABLE
HostList:
   [node151:1]
Holds:    Defer
Messages:  exceeds available partition procs
PE:  1.00  StartPriority:  20188
cannot select job 1621827 for partition DEFAULT (job hold active)

[root at omega server_priv]# pbsnodes node151
node151
      state = free
      np = 16
      properties = sl5,MEM24G,workdisk
      ntype = cluster
      status = opsys=linux,uname=Linux node151 2.6.18-274.18.1.el5 #1 
SMP Thu Feb 9 12:20:03 EST 2012 x86_64,sessions=? 0,nsessions=? 
0,nusers=0,idletime=68747,totmem=24675820kb,availmem=24521680kb,physmem=24675820kb,ncpus=16,loadave=0.00,gres=,netload=1921779741,size=1628314552kb:1788585084kb,state=free,jobs=,varattr=,rectime=1346440829




More information about the torqueusers mailing list