[torquedev] [torqueusers] Question about what does PBS_NUM_NODES and PBS_NUM_PPN means

Martin Siegert siegert at sfu.ca
Tue Dec 7 14:55:12 MST 2010

On Tue, Dec 07, 2010 at 01:26:20PM -0500, Glen Beane wrote:
> On Tue, Dec 7, 2010 at 1:18 PM, David Beer <dbeer at adaptivecomputing.com> wrote:
> >
> >> the customer isn't always right ;)
> >>
> >> really, I don't think we should pollute the codebase with hacks for
> >> specific customers when there may be a better more general way to do
> >> something that will have wider use
> >
> > I also wish that every time I had to solve a problem for a customer I had time to flush the idea out with the community, discover the best, most widely applicable solution, and then code that. Unfortunately, that is rarely the case. I believe we've made strong efforts to get the community more involved - I know we still can improve in this - but situations will always arise that just need to be fixed. It's not ideal but it happens.
> maybe we could keep those type of changes in a branch, or maybe give
> that customer a patch to solve their immediate need while we work on a
> more robust solution to push into torque?  I'm not saying things will
> be perfect,  but adding lots and lots of quick-fixes to satisfy a very
> small number of sites makes the code more complicated and harder to
> maintain.

Frankly, I would not like that at all.
Two cases from the recent past:
1) I submitted a patch that would implement an environment variable
   PBS_NCPUS that would contain the number of processors assigned to
   the job. It was rejected because of the vague possibility that
   sometime in the future there maybe support for dynamically sized
   jobs. Even though the patch was tiny and I couldn't care less, if
   PBS_NCPUS would have to be redefined sometime in the vague future
   to be "initial value of ...".
2) I submitted a patch that would allow routing based on a node
   specification -l nodes=x1:ppn=y1+x2:ppn=y2+... by calculation the
   sum x1*y1+x2*y2+... That patch was rejected since this would be
   fixed some time in the future anyway.

By now I learned that I should not have submitted the patches to
torque-dev, but to Moab support.

Where should that lead to? Everybody keeps their own little patches
around, Adaptive Computing keeps their patches and nothing gets
implemented in torque?

- Martin

More information about the torquedev mailing list