[Mauiusers] running jobs & restarting maui

Dave Jackson jacksond at clusterresources.com
Thu Nov 10 18:53:09 MST 2005


Garrick,

  This patch will work but it is a band-aid.  The real issue is that
Maui should not allow a job in with a negative walltime.  The negative
walltime should be detected and translated into a max walltime value
when it is first detected.

  The patch has been applied and a new snapshot released.  However, we
should find out why the job walltime made it through MPBSJobLoad() or
MRMJobPostLoad() with the negative walltime limit.

Thanks,
Dave

On Thu, 2005-11-10 at 17:29 -0800, Garrick Staples wrote:
> On Thu, Nov 10, 2005 at 12:30:52PM +0100, Thomas Dargel alleged:
> > Sorry for bothering you again, but updating to the snapshot you
> > suggested (3.2.6p14-snap.1129921819) won't change anything (no explicit
> > walltime setting, neither for the pbs_server, nor in the job-scripts): 
> > 
> > The jobs were thrown out by the restarted maui with the same log-message:
> > 11/10 11:32:27 ALERT:    job '601' in state 'Running' has exceeded its wallclock limit (0+S:0) by 12:02:25 (job will be cancelled)
> > 11/10 11:32:27 MSysRegEvent(JOBWCVIOLATION:  job '601' in state 'Running' has exceeded its wallclock limit (0) by 12:02:25 (job will be cancelled)  job start time: Wed Nov  9 23:30:02
> > 11/10 11:32:27 INFO:     job '601' successfully cancelled
> 
> I think this will do the trick, but it feels hackish to me.  Maybe one
> of the Maui peeps can say if this is a good idea.
> 
> diff -pruN maui-3.2.6p14_orig/src/moab/MLimit.c maui-3.2.6p14/src/moab/MLimit.c
> --- maui-3.2.6p14_orig/src/moab/MLimit.c        2005-10-21 12:10:17.000000000 -0700
> +++ maui-3.2.6p14/src/moab/MLimit.c     2005-11-10 17:20:41.000000000 -0800
> @@ -171,6 +171,7 @@ int MLimitEnforceAll(
>        JobWCX = J->Cred.C->F.Overrun;
>   
>      if ((JobWCX >= 0) &&
> +        (J->WCLimit > 0) &&
>          (MSched.Time > J->StartTime) &&
>         ((unsigned long)(MSched.Time - J->StartTime) > (J->WCLimit + J->SWallTime + JobWCX)))
>        {
> 
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers



More information about the mauiusers mailing list