[torquedev] pbsdsh and number of processors allocated

Craig Macdonald craigm at dcs.gla.ac.uk
Wed Jun 17 11:17:20 MDT 2009


A suggestion for pbsdsh improvement:

pbsdsh allows processes to be launched on either:
 (a) specified hosts in the job
 (b) once for every allocated processors on every allocated node in the job
 (c) all unique nodes in the job
 
I'd like to suggest an improvement to the (c) case.  Some job programs 
manage the number of processors to use on a given node (e.g. the Hadoop 
task tracker). However, if you allocate only processors, not whole 
nodes, then this can end up with too many processes running on a given 
node, as assumptions are drawn on the number of allocated processors per 
sister (e.g. my job asked for 12 processors. Nodes have 4 procs each, 
but one nodes already had a single processor job running - how should 
the spawned process know this?)

Instead, I'd like to propose that pbsdsh -u sets an environment variable 
in the resulting spawn processes, detailing the number of allocated 
processes. This should be fairly easy, as tm_spawn accepts an argument 
to alter the target environment of the spawned process.

Craig


More information about the torquedev mailing list