[torqueusers] qalter -lwalltime not propagated to slave moms

David Singleton David.Singleton at anu.edu.au
Mon Nov 28 13:39:02 MST 2005


I dont think any qalter variations are propagated to sisters.

Our solution for walltime variations is to make sure only MS
applies the walltime limit.


src/resmom/*/mom_mach.c:

int mom_over_limit(job *pjob)
{
  .....

                 } else if (strcmp(pname, "walltime") == 0) {
                         /* ANUPBS:
                                - only have MS check walltime
                                - covers bug: resource modifications are not propagated to the
                                  sisterhood
                                - assumes only walltime being modified (most common)
                          */
                         if ((pjob->ji_qs.ji_svrflags & JOB_SVFLG_HERE) == 0) continue;
                         retval = local_gettime(pres, &value);
                         if (retval != PBSE_NONE) continue;
                         num = (unsigned long)((double)(time_now - pjob->ji_qs.ji_stime)*wallfactor);
                         if (num > value) {
                                 sprintf(log_buffer,"walltime %lusec exceeded limit %lusec",num, value);
                                 ret = (JOB_SVFLG_OVERLMT1|JOB_SVFLG_OVERLMTWALL);
                         }
                 }

David


Martin Schafföner wrote:
> On Monday 28 November 2005 10:37, Thomas Zeiser wrote:
> 
>>Dear All,
>>
>>at least on our cluster, it seems that changes with qalter to the
>>walltime after the jobs is started are not correctly propagated to
>>sister moms. As a consequence, parallel jobs started with Pete's
>>mpiexec get killed once the original walltime is exceeded.
> 
> 
> I don't know the reason for this, but I can at least confirm this "feature" in 
> TORQUE 2.0.0p1.
> 
> Regards,


-- 
--------------------------------------------------------------------------
    Dr David Singleton               ANU Supercomputer Facility
    HPC Systems Manager              and APAC National Facility
    David.Singleton at anu.edu.au       Leonard Huxley Bldg (No. 56)
    Phone: +61 2 6125 4389           Australian National University
    Fax:   +61 2 6125 8199           Canberra, ACT, 0200, Australia
--------------------------------------------------------------------------


More information about the torqueusers mailing list