[Mauiusers] Suspended jobs not being resumed
Edgar Leon
edgar at mathcs.emory.edu
Thu Apr 10 14:57:17 MDT 2008
Could someone please help me resolve a problem where suspended jobs
are not being resumed?
checkjob shows the following for a suspended job:
State: Suspended EState: Running
EState 'Running' does not match current state 'Suspended'
cannot select job 3340 for partition DEFAULT (non-idle expected state
'Running')
Here are the details:
My batch system has two queues: default and batch2 (low priority).
Low priority jobs are suspended/resumed.
I submitted 96 jobs to the batch2 queue. Most of them finished and
two are still running.
Many of these jobs were suspended when high priority jobs were
submitted to the default queue and then they were resumed.
However 10 jobs that were suspended were not resumed and they have
been in this state for many hours:
%qstat
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
3304.head job0328 eleon 00:04:46 S batch2
3311.head job0328 eleon 00:02:19 S batch2
3335.head job0328 eleon 00:02:19 S batch2
3336.head job0328 eleon 00:02:22 S batch2
3340.head job0328 eleon 00:51:01 S batch2
3345.head job0328 eleon 00:02:17 S batch2
3346.head job0328 eleon 00:02:13 S batch2
3371.head job0328 eleon 00:02:30 S batch2
3372.head job0328 eleon 00:51:47 S batch2
3373.head job0328 eleon 00:07:35 S batch2
3374.head job0328 eleon 06:01:17 R batch2
3377.head job0328 eleon 06:05:41 R batch2
------------------------------------------------------------------
#/usr/local/maui/bin/checkjob 3340
checking job 3340
State: Suspended EState: Running
Creds: user:eleon group:guest class:batch2 qos:low
WallTime: 8:30:09 of 99:23:59:59
Suspended Wall Time: 16:08:44
SubmitTime: Wed Apr 9 11:16:57
(Time Queued Total: 1:01:08:09 Eligible: 8:41:01)
StartDate: -16:37:40 Wed Apr 9 19:47:26
Total Tasks: 1
Req[0] TaskCount: 1 Partition: DEFAULT
Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
Opsys: [NONE] Arch: [NONE] Features: [NONE]
NodeCount: 1
Allocated Nodes:
[node009:1]
WARNING: allocated node node009 is in state Idle
IWD: [NONE] Executable: [NONE]
Bypass: 0 StartCount: 1
PartitionMask: [ALL]
Flags: RESTARTABLE PREEMPTEE
Attr: PREEMPTEE
EState 'Running' does not match current state 'Suspended'
Reservation '3340' (-1:01:08:07 -> 98:22:51:52 Duration: 99:23:59:59)
PE: 1.00 StartPriority: 531
cannot select job 3340 for partition DEFAULT (non-idle expected state
'Running')
----------------------------------------------------------------------
The MAUI log shows:
04/10 14:33:56 INFO: attribute 'PREEMPTEE' set for job 3336
04/10 14:33:56 MPBSJobUpdate(3340,3340.head,TaskList,0)
04/10 14:33:56 INFO: attribute 'PREEMPTEE' set for job 3340
04/10 14:33:56 INFO: job '3340' Priority: 531
04/10 14:33:56 INFO: Cred: 10(00.0) FS: 0(00.0) Attr:
0(00.0) Serv: 521(00.0) Targ: 0(00.0) Res: 0(00.0)
Us: 0(00.0)
-------------------------------------------------------------------
Versions: maui-3.2.6p18 and torque-2.1.6
Here is the maui.cfg file:
# maui.cfg 3.2p8
SERVERHOST head
ADMIN1 root
RMCFG[HEAD] TYPE=PBS
AMCFG[bank] TYPE=NONE
RMPOLLINTERVAL 00:00:30
SERVERPORT 42559
SERVERMODE NORMAL
LOGFILE maui.log
LOGFILEMAXSIZE 300000000
LOGLEVEL 3
PREEMPTPOLICY SUSPEND
RESERVATIONPOLICY NEVER
QOSWEIGHT 1
CLASSCFG[default] QDEF=high
CLASSCFG[batch2] QDEF=low
QOSCFG[high] PRIORITY=200000
QOSCFG[high] QFLAGS=PREEMPTOR
QOSCFG[low] PRIORITY=10
QOSCFG[low] QFLAGS=PREEMPTEE
-------------------------------------------------------------------
I searched the Maui archives and found articles about a patch
that is already incorporated in the software that I am using:
http://www.supercluster.org/pipermail/mauiusers/2004-July/001284.html
-------------------------------------------------------------------
Please let me know if there are any explanations or suggestions.
Thanks.
Edgar
Edgar Leon PHONE: (404) 727-2867
Department of Math & Computer Science FAX: (404) 727-5611
400 Dowman Drive, Suite W401 EMAIL: edgar at mathcs.emory.edu
Emory University
Atlanta, GA 30322
More information about the mauiusers
mailing list