[Mauiusers] Suspended jobs not being resumed

Edgar Leon edgar at mathcs.emory.edu
Tue Apr 15 17:11:34 MDT 2008


Ronny,

Thank you for information that you provided.

 > I seem to vaguely remember a problem I had a while ago: suspend jobs
 > would not age and as such increase their priority again.

The suspended jobs have been in that state for a week and they were not
resumed even when there were no other jobs in the batch system.

The priority of the suspended jobs did not increase during the last week
as you pointed out.

I tried to manually increase the priority and checkjob showed:

EState 'Running' does not match current state 'Suspended'
Reservation '3372' (-6:05:11:18 -> 93:18:48:41  Duration: 99:23:59:59)
PE:  1.00  StartPriority:  552
cannot select job 3372 for partition DEFAULT (non-idle expected state 
'Running')
--------------------------------------------------------------------
# /usr/local/maui/bin/setspri 1000 3372

job system priority adjusted
--------------------------------------------------------------------
EState 'Running' does not match current state 'Suspended'
Reservation '3372' (-6:05:11:49 -> 93:18:48:10  Duration: 99:23:59:59)
PE:  1.00  StartPriority:  1000001000  SystemPriority:  1000

cannot select job 3372 for partition DEFAULT (non-idle expected state 
'Running')
--------------------------------------------------------------------
However the job remained in the suspended state and did not run.

I tried to manually force the job to run but it remained suspended:

# /usr/local/maui/bin/runjob  -c 3372
INFO:  successfully set hostlist for job '3372' to '1'

# /usr/local/maui/bin/runjob  -f 3372
job '3372' is in state 'Suspended'  (state must be idle)

# /usr/local/maui/bin/runjob  -x 3372
job '3372' is in state 'Suspended'  (state must be idle)

Is there a command to force a suspended job to run?

The only solution that I found was to restart maui.

qstat showed this state for many days:
Job id              Name             User            Time Use S Queue
------------------- ---------------- --------------- -------- - -----
3304.head           job0328          eleon           00:04:46 S batch2
3311.head           job0328          eleon           00:02:19 S batch2
3335.head           job0328          eleon           00:02:19 S batch2
3336.head           job0328          eleon           00:02:22 S batch2
3340.head           job0328          eleon           00:51:01 S batch2
3345.head           job0328          eleon           00:02:17 S batch2
3346.head           job0328          eleon           00:02:13 S batch2
3371.head           job0328          eleon           00:02:30 S batch2

After restarting maui without making changes to maui.cfg:

Job id              Name             User            Time Use S Queue
------------------- ---------------- --------------- -------- - -----
3304.head           job0328          eleon           00:04:46 R batch2
3311.head           job0328          eleon           00:02:19 R batch2
3335.head           job0328          eleon           00:02:19 R batch2
3336.head           job0328          eleon           00:02:22 R batch2
3340.head           job0328          eleon           00:51:01 R batch2
3345.head           job0328          eleon           00:02:17 R batch2
3346.head           job0328          eleon           00:02:13 R batch2
3371.head           job0328          eleon           00:02:30 R batch2

=========================================================================

 > To work-around this you will have to change your config as detailed in

I then modified maui.cfg, restarted maui and these variables are now 
enabled:

# /usr/local/maui/bin/showconfig -v | grep USAGE
USAGEWEIGHT[0]                    1
USAGEEXECUTIONTIMEWEIGHT[0]       1

The priority of suspended jobs is now increasing.

Thanks for the help.

Edgar



Ronny T. Lampert wrote, On 04/14/08 09:08:
>> Could someone please help me resolve a problem where suspended jobs
>> are not being resumed?
> 
> I hope I've understood your problem.
> I seem to vaguely remember a problem I had a while ago: suspend jobs 
> would not age and as such increase their priority again.
> So other, non-running but only queued jobs would have a higher priority 
> and would run before any suspend jobs.
> 
> 
> To work-around this you will have to change your config as detailed in 
> (here you can find my original problem report)
> 
> http://osdir.com/ml/clustering.maui.user/2006-08/msg00021.html
> 
> 
> Hope this helps,
> Ronny



More information about the mauiusers mailing list