[torquedev] TORQUE 2.2.0 Defaults

Garrick Staples garrick at usc.edu
Thu Aug 16 17:20:25 MDT 2007


On Thu, Aug 16, 2007 at 05:17:50PM -0600, Dave Jackson alleged:
> Garrick,
> 
> > > 3) set resources_available.nodect to automatically allow jobs up to the
> > > number of procs in the cluster
> > 
> > "setting" resources_available.nodect would be incorrect because then it would
> > never be set again.  The point of resources_available.nodect is override what
> > server thinks is correct.
> > 
> > Can we make this depend on node_pack?
> 
>   I don't fully understand your comments about 'it would never be set
> again'.  My main concern is a user of a new 32 quad core cluster
> submitting a job with 'qsub -l nodes=128' anticipating PBS's overly
> flexible definition of nodes, and not being able to run his job because
> of a 'mysterious' queue constraint.  I believe sites should be able to
> force a tighter node definition but by default, this type of warning
> will be confusing to a novice.

My point is about the "resources_available" attribute, not really about the
behaviour.  The point of the attribute is so that the admin can override the
default behaviour.  If you "set" the attribute by default, then it will only be
correct in certain environments and only until more nodes are added.


> > > 4) modify configure to not build the GUI by default (configure
> > > --disable-gui)
> > 
> > configure doesn't default to "on", it looks for the required deps and only
> > builds it if it can.  What is wrong with that?
> 
>   Not a problem if it is working.  I saw a problem yesterday in which a
> CentOS 4.4 system attempted to build the GUI by default then failed due
> to a TCL library issue.  I take it your preference would be to improve
> the dependency auto-detect capability?  What will you need?  config.log,
> config.status? other?

Indeed.  Let's fix the bugs and not throw out normal behaviour.

I had thought trunk was doing a pretty good job of this.  I have centos 4 here,
so can probably replicate any problems.  Please send config.log and 'rpm -qa |
egrep ^tcl\|^tk'.


> > > 5) modify pbs_mom to recover jobs by default (ie, default to 'pbs_mom
> > > -r')
> > 
> > That would be incorrect.  At boot, jobs can't be recovered.
> 
>   pbs_mom should be able to detect that quite easily since the process is 
> gone.  If the process is there, the most correct 'default' behavior should
> be to try to recover the job.  What exceptions should there be to this?  
> Again, this is default behavior and can be overridden by any advanced site.

How does pbs_mom know the process is gone?  It can't check the pids because
they might be reused by new processes after the boot.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20070816/f27b93aa/attachment.bin


More information about the torquedev mailing list