[Mauiusers] running jobs & restarting maui
garrick at usc.edu
Tue Nov 8 17:25:28 MST 2005
On Tue, Nov 08, 2005 at 03:18:47PM -0800, Garrick Staples alleged:
> On Wed, Nov 09, 2005 at 09:26:02AM +1100, Chris Samuel alleged:
> > On Tue, 8 Nov 2005 08:20 pm, Thomas Dargel wrote:
> > > 11/08 09:20:41 ALERT: ? ?job '561' in state 'Running' has exceeded its
> > > wallclock limit (0+S:0) by 16:43:00 (job will be cancelled)
> > > 11/08 09:20:41 MSysRegEvent(JOBWCVIOLATION: ?job '561' in state 'Running'
> > > has exceeded its wallclock limit (0) by 16:43:00 (job will be
> > > cancelled) ?job start
> > Bingo - the jobs are running with a walltime of 0 (i.e. not set) and for some
> > reason whilst Maui considers this to be infinite if the job is submitted
> > whilst it's running when it restarts it sees this as 0 and so kills the
> > job. :-(
> Agreeing 100% with Chris, this would appear to be a definite bug in maui
> that should be fixed. But I'm unable to duplicate the problem with
> maui-3.2.6p14-snap.1129921819. I restart maui and it happily recreates
> the infinite reservation. Which version are you using?
Of course, plan B is to just set a really really long default walltime.
Qmgr: set server resources_default.walltime=9000:00:00
You might even become like me and consider the existance of infinite
walltime jobs to be a bug! If we allowed such a thing on our cluster,
jobs would be forgotten and left to bitrot forever.
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/mauiusers/attachments/20051108/1eafac4d/attachment.bin
More information about the mauiusers