[torqueusers] Node assignment queue for shared memory computing

Jim Kusznir jkusznir at gmail.com
Wed Jul 29 17:50:16 MDT 2009

Hi all:

I'm currently working on getting hadoop running in scheduler mode with
torque, and basically need a shared memory node allocation.  By this,
I mean when the program requests -nodes=4, they mean 4 unique nodes
with all processors in those nodes allocated, AND ideally the
generated machine file only containing one entry for each node.

Unfortunately, I am not able to modify how it requests nodes (such as
make it use the :np=8 option), so when it requests --nodes=4, it needs
4 physically seperate nodes.  I tried a few ways to "outsmart" hadoop,
but all without success.

I also see this as required for running hybrid MPI/OpenMP jobs.  When
I ran such jobs, I want my MPI stack to only start one process per
physical node, but then have OpenMP run on "lightweight" threads to
use all the cores on that system.  I can do the -nodes=4:np=8 in this
case, but the generated machines file that OpenMPI gets tells it it
has 32 nodes in this case, so it would start 32 executables, 8 on each
node, when I actually only want 4 executables started.


