[torqueusers] serial submission order on dual processor cluster

Joshua Bernstein jbernstein at penguincomputing.com
Fri Oct 5 15:59:30 MDT 2007


Hi Richard,

> Hopefully this is the best forum for this question. We are running  
> Scyld 4.0 and Taskmaster 2.0 on a Penguin cluster with dual processor AMD nodes. We  
> regularly submit many serial jobs using the following PBS script : (modeled on  
> the sample in the Taskmaster docs)

This place works fine, but don't be afraid to contact Penguin support 
directly for these sorts of question. That is what we are here for.

> ################
> #PBS -N <job name>
> echo "Running on Node : $BEOWULF_JOB_MAP"
> echo Start Date: `date`
> echo Dir: $PWD
> echo "##########"
> echo ""
> bpsh $BEOWULF_JOB_MAP <executable and arguments> 
> echo ""
> echo "##########"
> echo End Date: `date`
> ###############
> 
> This loads jobs on the machine in the following order (node #) :
> 9 9 8 8 7 7 6 6 5 5 4 4 3 3 37 37 36 36  35 35  34 34 .....
> 
> How do we have it put one job on each node until all are filled and  
> then put the
> second job on.

The order that the jobs will be lauched will depend on the order each 
node number appears in the BEOWULF_JOB_MAP environment variable. For the 
sake of demonstration, consider a smaller, bit still applicable case 
with a 4 processes job. If my BEOWULD_JOB_MAP (sometimes abbrev. to just 
BJM) is set to 0:0:1:1, then the first two processes would each be 
started on node 0, followed by the third and forth processes on node 1.

Order is significant! If I can the order of BJM, I can place the 
processes in the order I'd like, so consider BEOWULF_JOB_MAP=0:1:0:1. 
Here a process would be placed on 0, then 1, then wrapping around back 
to node 0, and then 1 again.

Does that accomplish what you are looking to do?

-Joshua Bernstein
Software Engineer
Penguin Computing


More information about the torqueusers mailing list