[torquedev] [torqueusers] Question about what does PBS_NUM_NODES and PBS_NUM_PPN means

Glen Beane glen.beane at gmail.com
Tue Dec 7 20:06:23 MST 2010


On Tue, Dec 7, 2010 at 4:55 PM, Martin Siegert <siegert at sfu.ca> wrote:
> On Tue, Dec 07, 2010 at 01:26:20PM -0500, Glen Beane wrote:
>> On Tue, Dec 7, 2010 at 1:18 PM, David Beer <dbeer at adaptivecomputing.com> wrote:
>> >
>> >> the customer isn't always right ;)
>> >>
>> >> really, I don't think we should pollute the codebase with hacks for
>> >> specific customers when there may be a better more general way to do
>> >> something that will have wider use
>> >
>> > I also wish that every time I had to solve a problem for a customer I had time to flush the idea out with the community, discover the best, most widely applicable solution, and then code that. Unfortunately, that is rarely the case. I believe we've made strong efforts to get the community more involved - I know we still can improve in this - but situations will always arise that just need to be fixed. It's not ideal but it happens.
>>
>>
>> maybe we could keep those type of changes in a branch, or maybe give
>> that customer a patch to solve their immediate need while we work on a
>> more robust solution to push into torque?  I'm not saying things will
>> be perfect,  but adding lots and lots of quick-fixes to satisfy a very
>> small number of sites makes the code more complicated and harder to
>> maintain.
>
> Frankly, I would not like that at all.
> Two cases from the recent past:
> 1) I submitted a patch that would implement an environment variable
>   PBS_NCPUS that would contain the number of processors assigned to
>   the job. It was rejected because of the vague possibility that
>   sometime in the future there maybe support for dynamically sized
>   jobs. Even though the patch was tiny and I couldn't care less, if
>   PBS_NCPUS would have to be redefined sometime in the vague future
>   to be "initial value of ...".
> 2) I submitted a patch that would allow routing based on a node
>   specification -l nodes=x1:ppn=y1+x2:ppn=y2+... by calculation the
>   sum x1*y1+x2*y2+... That patch was rejected since this would be
>   fixed some time in the future anyway.
>
> By now I learned that I should not have submitted the patches to
> torque-dev, but to Moab support.

TORQUE patches should be submitted to bugzilla.  I'm not sure why you
think they should be submitted to Moab support.


> Where should that lead to? Everybody keeps their own little patches
> around, Adaptive Computing keeps their patches and nothing gets
> implemented in torque?

no no no,  I'm not arguing for that AT ALL.  I'm just saying if it is
a quick fix to satisfy the need for ONE customer then does it need to
be checked into the mainline TORQUE, especially if there is a more
general solution that might benefit many sites?  I'm just arguing that
we should at least discuss some of these publicly before they are
implemented in TORQUE.  If Adaptive has input from more than just the
one customer that request the original change then maybe we could end
up with a better solution that many people might find useful.


More information about the torquedev mailing list