[torquedev] qalter with blank walltimes sets walltime to zero

David Singleton David.Singleton at anu.edu.au
Wed Jun 18 16:11:36 MDT 2008


Hi Chris,

You may be saying what I understand to be the case but just
clarifying

  qsub -lresource=  jobid

will "unset" the resource limit for that job so that the limit
will fall back to any queue or server default.  If there are no
queue or server defaults (or limits) then the limit really does get
unset and should not be applied to the job.  The MOM should just
ignore that resource for that job.

We haven't "fixed" the case you are describing because I think its
"logical" but we have made a local mod to the server
modify_job_attr() function to stop admins accidentally doing things
like "extending" walltime limits of running jobs to 100:00 when
they meant 100:00:00. Something like:

	if (pjob->ji_qs.ji_state == JOB_STATE_RUNNING) {
		/*
		 *  Probably an erroneous request if job has already used more
		 *  resources than specified in the request - reject
		 */
		if (pjob->ji_wattr[(int)JOB_ATR_resc_used].at_flags&ATR_VFLAG_SET) {
			comp_resc(&pjob->ji_wattr[(int)JOB_ATR_resc_used],
				  &newattr[(int)JOB_ATR_resource]);
			if (comp_resc_gt)
				rc = PBSE_IVALREQ;
		}
	}

David

> Hi all,
> 
> We've found recently that if you happen to do:
> 
> qalter -l walltime= 12:0:0 12345
> 
> to set job 12345 to now be 12 hours, the job dies
> immediately and you get an error about an illegal
> job id.
> 
> This is because the space results in the walltime of
> 12345 being set to 0:0:0 and it immediately gets killed
> by pbs_mom (quite understandably).
> 
> I can't see any reason why setting the walltime to 0
> should be valid, but digging around in the code it looks
> like decode_time() in src/lib/Libattr/attr_fn_time.c
> specifically accepts that as valid and sets the time to 0.
> 
> Unfortunately replacing that with a goto to the badval
> label results in pbs_server dieing with:
> 
> *** glibc detected *** /usr/local/torque-trunk/sbin/pbs_server: double free or corruption (out): 0x00000000023356d0 ***
> 
> :-(
> 
> My only concern is that blanket banning walltimes of 0
> may break the default case where the queues don't set a
> default walltime and neither does the job.
> 
> Thoughts ?
> 
> cheers,
> Chris


More information about the torquedev mailing list