[Mauiusers] running jobs & restarting maui

Thomas Dargel td at chemie.hu-berlin.de
Thu Nov 10 04:30:52 MST 2005


On Wed, Nov 09, 2005 at 07:27:06AM +0100, Thomas Dargel wrote:
> On Tue, Nov 08, 2005 at 03:18:47PM -0800, Garrick Staples wrote:
> > On Wed, Nov 09, 2005 at 09:26:02AM +1100, Chris Samuel alleged:
> > > On Tue, 8 Nov 2005 08:20 pm, Thomas Dargel wrote:
> > > 
> > > > 11/08 09:20:41 ALERT: ? ?job '561' in state 'Running' has exceeded its
> > > > wallclock limit (0+S:0) by 16:43:00 (job will be cancelled)
> > > > 11/08 09:20:41 MSysRegEvent(JOBWCVIOLATION: ?job '561' in state 'Running'
> > > > has exceeded its wallclock limit (0) by 16:43:00 (job will be
> > > > cancelled) ?job start 
> > > 
> > > Bingo - the jobs are running with a walltime of 0 (i.e. not set) and for some 
> > > reason whilst Maui considers this to be infinite if the job is submitted 
> > > whilst it's running when it restarts it sees this as 0 and so kills the 
> > > job. :-(
> > 
> > Agreeing 100% with Chris, this would appear to be a definite bug in maui
> > that should be fixed.  But I'm unable to duplicate the problem with
> > maui-3.2.6p14-snap.1129921819.  I restart maui and it happily recreates
> > the infinite reservation.  Which version are you using?
> > 
> > -- 
> > Garrick Staples, Linux/HPCC Administrator
> > University of Southern California
> 
> Hi Garrick,
> 
> my maui installation is 3.2.6p13, I will try to update to your
> snapshot release and test this behavior again.
> 
> Thanks,
> 
> Thomas.
> 

Sorry for bothering you again, but updating to the snapshot you
suggested (3.2.6p14-snap.1129921819) won't change anything (no explicit
walltime setting, neither for the pbs_server, nor in the job-scripts): 

The jobs were thrown out by the restarted maui with the same log-message:
11/10 11:32:27 ALERT:    job '601' in state 'Running' has exceeded its wallclock limit (0+S:0) by 12:02:25 (job will be cancelled)
11/10 11:32:27 MSysRegEvent(JOBWCVIOLATION:  job '601' in state 'Running' has exceeded its wallclock limit (0) by 12:02:25 (job will be cancelled)  job start time: Wed Nov  9 23:30:02
11/10 11:32:27 INFO:     job '601' successfully cancelled

Jobs with '-l walltime=2400:00:00' were not affected.

The goal is to transfer the 'infinite'-value from the unset 'resources_default.walltime'
from the pbs_server configuration to the maui-scheduler.

Do you have walltime-restrictions on any of your queues/jobs per default, so perhaps
that may the reason why you cannot dublicate my problem.

By the way, I forgot to give details of my system:
x86_64 SLES9 SP2 machine, torque 1.2.0p6 w. maui-3.2.6p13 / 3.2.6p14-snap.1129921819

Do you still have another hint as setting 'resources_default.walltime=9000:00:00'?

Thanks,

 Thomas.



More information about the mauiusers mailing list