[torquedev] pbsdsh and number of processors allocated

Garrick garrick at usc.edu
Thu Jun 18 08:35:58 MDT 2009


Nothing exists yet, but I imagine a signal would be necessary. Sighup  
would work but we should use something that doesn't default to  
terminate. Sigwinch would be funnily appropriate.

HPCC/Linux Systems Admin

On Jun 18, 2009, at 4:05 AM, Craig Macdonald <craigm at dcs.gla.ac.uk>  
wrote:

> Hi Garrick,
>
> Would there be some mechanism for notifying running processes of a  
> change? Or would they be expected to poll a file. Lets say it's  
> called $PBS_CPUCOUNTFILE, and just contains the number of processors  
> that the job has allocated.
>
> Craig
>
> Garrick wrote:
>> A few words of caution: one day we will dynamicly sized jobs and  
>> putting values like number of nodes or jobs into env vars wouldn't  
>> be valid. These values need to be queryable from pbs_mom or read  
>> from disk (like $PBS_NODEFILE)
>>
>> HPCC/Linux Systems Admin
>>
>> On Jun 17, 2009, at 10:17 AM, Craig Macdonald  
>> <craigm at dcs.gla.ac.uk> wrote:
>>
>>> A suggestion for pbsdsh improvement:
>>>
>>> pbsdsh allows processes to be launched on either:
>>> (a) specified hosts in the job
>>> (b) once for every allocated processors on every allocated node in  
>>> the job
>>> (c) all unique nodes in the job
>>>
>>> I'd like to suggest an improvement to the (c) case.  Some job  
>>> programs
>>> manage the number of processors to use on a given node (e.g. the  
>>> Hadoop
>>> task tracker). However, if you allocate only processors, not whole
>>> nodes, then this can end up with too many processes running on a  
>>> given
>>> node, as assumptions are drawn on the number of allocated  
>>> processors per
>>> sister (e.g. my job asked for 12 processors. Nodes have 4 procs  
>>> each,
>>> but one nodes already had a single processor job running - how  
>>> should
>>> the spawned process know this?)
>>>
>>> Instead, I'd like to propose that pbsdsh -u sets an environment  
>>> variable
>>> in the resulting spawn processes, detailing the number of allocated
>>> processes. This should be fairly easy, as tm_spawn accepts an  
>>> argument
>>> to alter the target environment of the spawned process.
>>>
>>> Craig
>>> _______________________________________________
>>> torquedev mailing list
>>> torquedev at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/torquedev
>


More information about the torquedev mailing list