[Mauiusers] running jobs & restarting maui
Garrick Staples
garrick at usc.edu
Thu Nov 10 18:29:33 MST 2005
On Thu, Nov 10, 2005 at 12:30:52PM +0100, Thomas Dargel alleged:
> Sorry for bothering you again, but updating to the snapshot you
> suggested (3.2.6p14-snap.1129921819) won't change anything (no explicit
> walltime setting, neither for the pbs_server, nor in the job-scripts):
>
> The jobs were thrown out by the restarted maui with the same log-message:
> 11/10 11:32:27 ALERT: job '601' in state 'Running' has exceeded its wallclock limit (0+S:0) by 12:02:25 (job will be cancelled)
> 11/10 11:32:27 MSysRegEvent(JOBWCVIOLATION: job '601' in state 'Running' has exceeded its wallclock limit (0) by 12:02:25 (job will be cancelled) job start time: Wed Nov 9 23:30:02
> 11/10 11:32:27 INFO: job '601' successfully cancelled
I think this will do the trick, but it feels hackish to me. Maybe one
of the Maui peeps can say if this is a good idea.
diff -pruN maui-3.2.6p14_orig/src/moab/MLimit.c maui-3.2.6p14/src/moab/MLimit.c
--- maui-3.2.6p14_orig/src/moab/MLimit.c 2005-10-21 12:10:17.000000000 -0700
+++ maui-3.2.6p14/src/moab/MLimit.c 2005-11-10 17:20:41.000000000 -0800
@@ -171,6 +171,7 @@ int MLimitEnforceAll(
JobWCX = J->Cred.C->F.Overrun;
if ((JobWCX >= 0) &&
+ (J->WCLimit > 0) &&
(MSched.Time > J->StartTime) &&
((unsigned long)(MSched.Time - J->StartTime) > (J->WCLimit + J->SWallTime + JobWCX)))
{
--
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/mauiusers/attachments/20051110/07f792a2/attachment.bin
More information about the mauiusers
mailing list