[Mauiusers] running jobs & restarting maui

Garrick Staples garrick at usc.edu
Thu Nov 10 18:29:33 MST 2005


On Thu, Nov 10, 2005 at 12:30:52PM +0100, Thomas Dargel alleged:
> Sorry for bothering you again, but updating to the snapshot you
> suggested (3.2.6p14-snap.1129921819) won't change anything (no explicit
> walltime setting, neither for the pbs_server, nor in the job-scripts): 
> 
> The jobs were thrown out by the restarted maui with the same log-message:
> 11/10 11:32:27 ALERT:    job '601' in state 'Running' has exceeded its wallclock limit (0+S:0) by 12:02:25 (job will be cancelled)
> 11/10 11:32:27 MSysRegEvent(JOBWCVIOLATION:  job '601' in state 'Running' has exceeded its wallclock limit (0) by 12:02:25 (job will be cancelled)  job start time: Wed Nov  9 23:30:02
> 11/10 11:32:27 INFO:     job '601' successfully cancelled

I think this will do the trick, but it feels hackish to me.  Maybe one
of the Maui peeps can say if this is a good idea.

diff -pruN maui-3.2.6p14_orig/src/moab/MLimit.c maui-3.2.6p14/src/moab/MLimit.c
--- maui-3.2.6p14_orig/src/moab/MLimit.c        2005-10-21 12:10:17.000000000 -0700
+++ maui-3.2.6p14/src/moab/MLimit.c     2005-11-10 17:20:41.000000000 -0800
@@ -171,6 +171,7 @@ int MLimitEnforceAll(
       JobWCX = J->Cred.C->F.Overrun;
  
     if ((JobWCX >= 0) &&
+        (J->WCLimit > 0) &&
         (MSched.Time > J->StartTime) &&
        ((unsigned long)(MSched.Time - J->StartTime) > (J->WCLimit + J->SWallTime + JobWCX)))
       {

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/mauiusers/attachments/20051110/07f792a2/attachment.bin


More information about the mauiusers mailing list