[torquedev] New 2.5.6 snapshot

David Beer dbeer at adaptivecomputing.com
Tue Apr 19 10:33:21 MDT 2011



----- Original Message -----
> Hi,
> 
> On Thu, Apr 07, 2011 at 05:05:04PM -0600, Ken Nielson wrote:
> > There is a new snapshot for 2.5.6 available. This fixes a problem
> > with
> > a patch for Bugzilla 116 where the new resource procct was added. If
> > the
> > -l nodes option was not used in a job submission then the job would
> > not
> > be run by Moab because procct was added to the Resource_List
> > attribute
> > and treated like a generic resource by Moab. Because the generic
> > resource
> > procct does not exist Moab never schedules the job.
> >
> > This is now fixed.
> >
> > You can download this snapshot at
> > http://www.clusterresources.com/downloads/torque/snapshots/torque-2.5.6-snap.201104071657.tar.gz
> >
> > Please download and let us know if you find any problems.
> 
> I am afraid this does not work: I haven't traced this back to the
> source routine, but apparently this new version presets the nodes
> resource to 1, correct?
> Thus, if a user only requests -l procs=N, with 2.5.6-snap.201104071657
> procct is set to N+1, not N, see
> 
> resc_def_all.c, line 1118:
> 
> ppct->rs_value.at_val.at_long =
> count_proc(pnodesp->rs_value.at_val.at_str)
> + pprocsp->rs_value.at_val.at_long;
> 
> torque-2.5.6-snap.201104041023 actually worked flawlessly for me.
> Which means that I haven't figured out how to trigger the bug that
> torque-2.5.6-snap.201104071657 was supposed to fix.
> Regardless of whether I specified -l nodes=... or -l procs=... or
> neither moab always started my job, i.e., the procct resource
> always got removed before the job was sent to moab, see,
> 
> svr_jobfunc.c, line 1965:
> 
> if (strcmp(pque->qu_attr->at_val.at_str, "Execution") == 0)
> {
> /* job routed to Execution queue successfully */
> /* unset job's procct resource */
> resource_def *pctdef;
> resource *pctresc;
> pctdef = find_resc_def(svr_resc_def, "procct", svr_resc_size);
> if ((pctresc = find_resc_entry(&pjob->ji_wattr[JOB_ATR_resource],
> pctdef)) != NULL)
> pctdef->rs_free(&pctresc->rs_value);
> }
> }
> 
> If somebody can explain to me how to submit a job that is not caught
> in
> this if block, I may be able to fix this.
> 

This issue is now resolved. The problem was where no resource was requested and then the nodes request was applied by default. This was resolved by adding code to free the resource after queue and server defaults are applied. The new snapshot can be found here:

http://www.clusterresources.com/downloads/torque/snapshots/torque-2.5.6-snap.201104191030.tar.gz

-- 
David Beer 
Direct Line: 801-717-3386 | Fax: 801-717-3738
     Adaptive Computing
     1656 S. East Bay Blvd. Suite #300
     Provo, UT 84606

MoabCon is Coming Soon, May 10-12
Register now: www.adaptivecomputing.com/moabcon


More information about the torquedev mailing list