[torqueusers] qalter -lwalltime not propagated to slave moms
David Singleton
David.Singleton at anu.edu.au
Mon Nov 28 13:39:02 MST 2005
I dont think any qalter variations are propagated to sisters.
Our solution for walltime variations is to make sure only MS
applies the walltime limit.
src/resmom/*/mom_mach.c:
int mom_over_limit(job *pjob)
{
.....
} else if (strcmp(pname, "walltime") == 0) {
/* ANUPBS:
- only have MS check walltime
- covers bug: resource modifications are not propagated to the
sisterhood
- assumes only walltime being modified (most common)
*/
if ((pjob->ji_qs.ji_svrflags & JOB_SVFLG_HERE) == 0) continue;
retval = local_gettime(pres, &value);
if (retval != PBSE_NONE) continue;
num = (unsigned long)((double)(time_now - pjob->ji_qs.ji_stime)*wallfactor);
if (num > value) {
sprintf(log_buffer,"walltime %lusec exceeded limit %lusec",num, value);
ret = (JOB_SVFLG_OVERLMT1|JOB_SVFLG_OVERLMTWALL);
}
}
David
Martin Schafföner wrote:
> On Monday 28 November 2005 10:37, Thomas Zeiser wrote:
>
>>Dear All,
>>
>>at least on our cluster, it seems that changes with qalter to the
>>walltime after the jobs is started are not correctly propagated to
>>sister moms. As a consequence, parallel jobs started with Pete's
>>mpiexec get killed once the original walltime is exceeded.
>
>
> I don't know the reason for this, but I can at least confirm this "feature" in
> TORQUE 2.0.0p1.
>
> Regards,
--
--------------------------------------------------------------------------
Dr David Singleton ANU Supercomputer Facility
HPC Systems Manager and APAC National Facility
David.Singleton at anu.edu.au Leonard Huxley Bldg (No. 56)
Phone: +61 2 6125 4389 Australian National University
Fax: +61 2 6125 8199 Canberra, ACT, 0200, Australia
--------------------------------------------------------------------------
More information about the torqueusers
mailing list