Bug 6 - pbsdsh -u adds an envar with the number of processes started
: pbsdsh -u adds an envar with the number of processes started
Status: NEW
Product: TORQUE
clients
: 2.4.x
: All All
: P5 enhancement
Assigned To: Glen
:
:
:
  Show dependency treegraph
 
Reported: 2009-06-17 17:37 MDT by Joshua Bernstein
Modified: 2009-06-17 17:37 MDT (History)
0 users (show)

See Also:


Attachments


Note

You need to log in before you can comment on or make changes to this bug.


Description Joshua Bernstein 2009-06-17 17:37:21 MDT
A suggestion for pbsdsh improvement:

pbsdsh allows processes to be launched on either:
 (a) specified hosts in the job
 (b) once for every allocated processors on every allocated node in the job
 (c) all unique nodes in the job

I'd like to suggest an improvement to the (c) case.  Some job programs 
manage the number of processors to use on a given node (e.g. the Hadoop 
task tracker). However, if you allocate only processors, not whole 
nodes, then this can end up with too many processes running on a given 
node, as assumptions are drawn on the number of allocated processors per 
sister (e.g. my job asked for 12 processors. Nodes have 4 procs each, 
but one nodes already had a single processor job running - how should 
the spawned process know this?)

Instead, I'd like to propose that pbsdsh -u sets an environment variable 
in the resulting spawn processes, detailing the number of allocated 
processes. This should be fairly easy, as tm_spawn accepts an argument 
to alter the target environment of the spawned process.

This is pulled from:

http://www.clusterresources.com/pipermail/torquedev/2009-June/001583.html