[torqueusers] jobs stuck in queue
Azher Mughal
azher at hep.caltech.edu
Fri Aug 31 13:23:39 MDT 2012
Hi all,
I have jobs stucked in the queue. One of the sample job and related node
output is below.
Server is 2.3.7 with maui.
Any help ?
Thanks
-Azher
[root at omega server_priv]# checkjob -v 1621827.omega
checking job 1621827 (RM job '1621827.omega.cluster.hep.caltech.edu')
State: Idle
Creds: user:bays group:minos class:minos qos:DEFAULT
WallTime: 00:00:00 of 10:00:00:00
SubmitTime: Mon Aug 27 16:31:10
(Time Queued Total: 3:19:49:56 Eligible: 00:00:00)
Total Tasks: 1
Req[0] TaskCount: 1 Partition: ALL
Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
Opsys: [NONE] Arch: [NONE] Features: [NONE]
Exec: '' ExecSize: 0 ImageSize: 0
Dedicated Resources Per Task: PROCS: 1 MEM: 1024M
NodeAccess: SHARED
NodeCount: 0
IWD: [NONE] Executable: [NONE]
Bypass: 0 StartCount: 0
PartitionMask: [ALL]
SystemQueueTime: Fri Aug 31 09:12:14
Flags: HOSTLIST RESTARTABLE
HostList:
[node151:1]
Holds: Defer
Messages: exceeds available partition procs
PE: 1.00 StartPriority: 20188
cannot select job 1621827 for partition DEFAULT (job hold active)
[root at omega server_priv]# pbsnodes node151
node151
state = free
np = 16
properties = sl5,MEM24G,workdisk
ntype = cluster
status = opsys=linux,uname=Linux node151 2.6.18-274.18.1.el5 #1
SMP Thu Feb 9 12:20:03 EST 2012 x86_64,sessions=? 0,nsessions=?
0,nusers=0,idletime=68747,totmem=24675820kb,availmem=24521680kb,physmem=24675820kb,ncpus=16,loadave=0.00,gres=,netload=1921779741,size=1628314552kb:1788585084kb,state=free,jobs=,varattr=,rectime=1346440829
More information about the torqueusers
mailing list