[torquedev] TORQUE 2.2.0 Defaults

Martin Siegert siegert at sfu.ca
Thu Aug 16 17:17:48 MDT 2007


Hi Dave,

On Thu, Aug 16, 2007 at 05:17:50PM -0600, Dave Jackson wrote:
> Garrick,
> 
> > > 3) set resources_available.nodect to automatically allow jobs up to the
> > > number of procs in the cluster
> > 
> > "setting" resources_available.nodect would be incorrect because then it would
> > never be set again.  The point of resources_available.nodect is override what
> > server thinks is correct.
> > 
> > Can we make this depend on node_pack?
> 
>   I don't fully understand your comments about 'it would never be set
> again'.  My main concern is a user of a new 32 quad core cluster
> submitting a job with 'qsub -l nodes=128' anticipating PBS's overly
> flexible definition of nodes, and not being able to run his job because
> of a 'mysterious' queue constraint.  I believe sites should be able to
> force a tighter node definition but by default, this type of warning
> will be confusing to a novice.

You may expect this question from me, thus I ask anyway :-)
can we get "-l procs=n" soon a get rid of this problem once and for all
in way that is comprehensible to users?

Cheers,
Martin

-- 
Martin Siegert
Head, Research Computing
WestGrid Site Lead
Academic Computing Services                phone: 778 782-4691
Simon Fraser University                    fax:   778 782-4242
Burnaby, British Columbia                  email: siegert at sfu.ca
Canada  V5A 1S6

Note: SFU has new phone numbers! 
      Please use the new numbers listed above from now on.

> > > 4) modify configure to not build the GUI by default (configure
> > > --disable-gui)
> > 
> > configure doesn't default to "on", it looks for the required deps and only
> > builds it if it can.  What is wrong with that?
> 
>   Not a problem if it is working.  I saw a problem yesterday in which a
> CentOS 4.4 system attempted to build the GUI by default then failed due
> to a TCL library issue.  I take it your preference would be to improve
> the dependency auto-detect capability?  What will you need?  config.log,
> config.status? other?
> 
> 
> > > 5) modify pbs_mom to recover jobs by default (ie, default to 'pbs_mom
> > > -r')
> > 
> > That would be incorrect.  At boot, jobs can't be recovered.
> 
>   pbs_mom should be able to detect that quite easily since the process is 
> gone.  If the process is there, the most correct 'default' behavior should
> be to try to recover the job.  What exceptions should there be to this?  
> Again, this is default behavior and can be overridden by any advanced site.
> 
> Dave
> 
> On Thu, 2007-08-16 at 14:40 -0700, Garrick Staples wrote:
> > On Thu, Aug 16, 2007 at 03:41:23PM -0600, Dave Jackson alleged:
> > > 3) set resources_available.nodect to automatically allow jobs up to the
> > > number of procs in the cluster
> > 
> > "setting" resources_available.nodect would be incorrect because then it would
> > never be set again.  The point of resources_available.nodect is override what
> > server thinks is correct.
> > 
> > Can we make this depend on node_pack?
> > 
> >  
> > > 4) modify configure to not build the GUI by default (configure
> > > --disable-gui)
> > 
> > configure doesn't default to "on", it looks for the required deps and only
> > builds it if it can.  What is wrong with that?
> > 
> >  
> > > 5) modify pbs_mom to recover jobs by default (ie, default to 'pbs_mom
> > > -r')
> > 
> > That would be incorrect.  At boot, jobs can't be recovered.
> > 
> >  
> > >   Are there issues with these defaults?  Are there additional defaults
> > > which should be set?
> > > 
> > > Thanks,
> > > Dave
> > > 
> > > _______________________________________________
> > > torquedev mailing list
> > > torquedev at supercluster.org
> > > http://www.supercluster.org/mailman/listinfo/torquedev
> > _______________________________________________
> > torquedev mailing list
> > torquedev at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torquedev
> 
> _______________________________________________
> torquedev mailing list
> torquedev at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torquedev


More information about the torquedev mailing list