[Mauiusers] Does Maui respect EXTENDEDVIOLATION resource limits?

Nick Sonneveld Nicholas.Sonneveld at utas.edu.au
Tue Mar 6 22:58:13 MST 2007


Hullo,

I'm running maui 3.2.6p19-snap.1169758944 and I'm having trouble trying 
to get it to allow resource overruns for a short time.

Current settings:

whiteout:/var/spool/maui # /apps/maui/bin/showconfig  -v | grep 
RESOURCELIMITPOLICY
RESOURCELIMITPOLICY[0]            PROC:EXTENDEDVIOLATION:CANCEL:00:15:00 
MEM:ALWAYS:CANCEL
whiteout:/var/spool/maui #


However, looking at the logs today, I saw:

whiteout:/var/spool/maui/log # grep -i 'violation' maui.log
03/07 11:36:25 MSysRegEvent(JOBRESVIOLATION:  job '3648' in state 
'Running' has exceeded PROC resource limit (141 > 100) (action CANCEL 
will be taken)  job start time: Wed Mar  7 11:35:32
03/07 11:36:25 ALERT:    limit violation action CANCEL succeeded

and

whiteout:/var/spool/maui/log # tracejob 3648

Job: 3648.whiteout.sf.utas.edu.au

03/07/2007 00:40:01  S    enqueuing into batch, state 1 hop 1
03/07/2007 00:40:01  S    Job Queued at request of
                           prachab at whiteout.sf.utas.edu.au, owner =
                           prachab at whiteout.sf.utas.edu.au, job name = 
Test2_4C,
                           queue = batch
03/07/2007 00:40:01  A    queue=batch
03/07/2007 11:35:32  S    Job Modified at request of
                           maui at whiteout.sf.utas.edu.au
03/07/2007 11:35:32  S    Job Run at request of maui at whiteout.sf.utas.edu.au
03/07/2007 11:35:33  M    Job Modified at request of
                           PBS_Server at whiteout.sf.utas.edu.au
03/07/2007 11:35:33  S    Job Modified at request of
                           maui at whiteout.sf.utas.edu.au
03/07/2007 11:35:33  A    user=prachab group=users jobname=Test2_4C 
queue=batch
                           ctime=1173188401 qtime=1173188401 
etime=1173188401
                           start=1173227733 exec_host=whiteout
                           Resource_List.mem=2000mb Resource_List.ncpus=1
                           Resource_List.neednodes=whiteout
                           Resource_List.nodect=1
                           Resource_List.walltime=1000:00:00
03/07/2007 11:36:25  S    Job deleted at request of 
maui at whiteout.sf.utas.edu.au03/07/2007 11:36:25  S    Job sent signal 
SIGTERM on delete
03/07/2007 11:36:25  M    kill_task: killing pid 32547 task 1 with sig 15
03/07/2007 11:36:25  M    kill_task: killing pid 32569 task 1 with sig 15
03/07/2007 11:36:25  M    kill_task: killing pid 32574 task 1 with sig 15
03/07/2007 11:36:25  M    kill_task: killing pid 32615 task 1 with sig 15
03/07/2007 11:36:25  A    requestor=maui at whiteout.sf.utas.edu.au
03/07/2007 11:36:28  S    Exit_status=143 resources_used.cput=00:00:46
                           resources_used.mem=300784kb
                           resources_used.vmem=341792kb
                           resources_used.walltime=00:00:52
03/07/2007 11:36:28  M    kill_task: killing pid 32615 task 1 with sig 9
03/07/2007 11:36:28  M    scan_for_terminated: job 
3648.whiteout.sf.utas.edu.au
                           task 1 terminated, sid 32547
03/07/2007 11:36:28  M    job was terminated
03/07/2007 11:36:28  A    user=prachab group=users jobname=Test2_4C 
queue=batch
                           ctime=1173188401 qtime=1173188401 
etime=1173188401
                           start=1173227733 exec_host=whiteout
                           Resource_List.mem=2000mb Resource_List.ncpus=1
                           Resource_List.neednodes=batch 
Resource_List.nodect=1
                           Resource_List.walltime=1000:00:00 session=32547
                           end=1173227788 Exit_status=143
                           resources_used.cput=00:00:46
                           resources_used.mem=300784kb
                           resources_used.vmem=341792kb
                           resources_used.walltime=00:00:52
03/07/2007 11:36:37  S    dequeuing from batch, state COMPLETE


It looks like Maui didn't wait a full 15 minutes before killing the job. 
    Is there something wrong with my config?

- Nick

-- 
Nick Sonneveld  |  Nicholas.Sonneveld at utas.edu.au
IT Resources, University of Tasmania, Private Bag 69, Hobart Tas 7001
(03) 6226 6377  |  0407 336 309  |  Fax (03) 6226 7171


More information about the mauiusers mailing list