[torquedev] [torqueusers] Question about what does PBS_NUM_NODES and PBS_NUM_PPN means

Ken Nielson knielson at adaptivecomputing.com
Tue Dec 7 13:24:11 MST 2010


On 12/07/2010 10:52 AM, Glen Beane wrote:
> On Tue, Dec 7, 2010 at 11:41 AM, David Beer<dbeer at adaptivecomputing.com>  wrote:
>    
>>      
>>> This kind of limits the usefulness of this information... A user can
>>> get more accurate information by parsing the nodefile, but if we
>>> wanted to make this information easier to get why not put it in a
>>> file? One line per node allocated, the format could be something
>>> like:
>>>
>>> $PBS_NODENUM:ppn
>>>
>>> so for a job that requested nodes=4:ppn=16 you would end up with a
>>> file like this:
>>>
>>> 0:16
>>> 1:16
>>> 2:16
>>> 3:16
>>>
>>>
>>> then we just set a environment variable that points to the location of
>>> this file.
>>>
>>> However, this idea probably has a few problems as well -- I still
>>> think it is better than a static ENV variable. I think in the future
>>> there might be a concept of a dynamically sized job that can
>>> grow/shrink, in that case at least the pbs_mom can rewrite the file,
>>> but there might be a better way to convey that information.
>>>
>>> This is the type of change that should be discussed by the TORQUE
>>> community before they are made -- the approach clearly has
>>> limitations, perhaps we could have come up with a better solution by
>>> just spending a little time talking about it first.
>>>
>>>        
>> This wasn't really designed to be a widely-used feature to make things easier - it was a quick solution to a specific site's use case. It took about 30 minutes to implement in TORQUE, and it doesn't affect anyone who doesn't want to use it. If there is a need to make the $PBS_NODEFILE information more accessible, then that is a different discussion. This is just an easy solution for a customer.
>>      
>
> the customer isn't always right ;)
>
> really, I don't think we should pollute the codebase with hacks for
> specific customers when there may be a better more general way to do
> something that will have wider use
> _______________________________________________
> torquedev mailing list
> torquedev at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torquedev
>    
Glen makes a good point. I like the simplicity of the solution as well. 
I would bet we could persuade the customer to change to the new 
convention. In the long run I think it would work out better.

Ken


More information about the torquedev mailing list