[torqueusers] qalter -lwalltime not propagated to slave moms
Garrick Staples
garrick at usc.edu
Mon Nov 28 15:08:35 MST 2005
Definitely a bug that changes aren't propogated to sisters, but your
change for walltime seems reasonable anyways. I'll commit it.
On Tue, Nov 29, 2005 at 07:39:02AM +1100, David Singleton alleged:
>
> I dont think any qalter variations are propagated to sisters.
>
> Our solution for walltime variations is to make sure only MS
> applies the walltime limit.
>
>
> src/resmom/*/mom_mach.c:
>
> int mom_over_limit(job *pjob)
> {
> .....
>
> } else if (strcmp(pname, "walltime") == 0) {
> /* ANUPBS:
> - only have MS check walltime
> - covers bug: resource modifications are not
> propagated to the
> sisterhood
> - assumes only walltime being modified (most
> common)
> */
> if ((pjob->ji_qs.ji_svrflags & JOB_SVFLG_HERE) ==
> 0) continue;
> retval = local_gettime(pres, &value);
> if (retval != PBSE_NONE) continue;
> num = (unsigned long)((double)(time_now -
> pjob->ji_qs.ji_stime)*wallfactor);
> if (num > value) {
> sprintf(log_buffer,"walltime %lusec
> exceeded limit %lusec",num, value);
> ret =
> (JOB_SVFLG_OVERLMT1|JOB_SVFLG_OVERLMTWALL);
> }
> }
>
> David
>
>
> Martin Schaff?ner wrote:
> >On Monday 28 November 2005 10:37, Thomas Zeiser wrote:
> >
> >>Dear All,
> >>
> >>at least on our cluster, it seems that changes with qalter to the
> >>walltime after the jobs is started are not correctly propagated to
> >>sister moms. As a consequence, parallel jobs started with Pete's
> >>mpiexec get killed once the original walltime is exceeded.
> >
> >
> >I don't know the reason for this, but I can at least confirm this
> >"feature" in TORQUE 2.0.0p1.
> >
> >Regards,
>
>
> --
> --------------------------------------------------------------------------
> Dr David Singleton ANU Supercomputer Facility
> HPC Systems Manager and APAC National Facility
> David.Singleton at anu.edu.au Leonard Huxley Bldg (No. 56)
> Phone: +61 2 6125 4389 Australian National University
> Fax: +61 2 6125 8199 Canberra, ACT, 0200, Australia
> --------------------------------------------------------------------------
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
--
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20051128/c6ded149/attachment.bin
More information about the torqueusers
mailing list