[torqueusers] Node assignment queue for shared memory computing
jkusznir at gmail.com
Wed Jul 29 17:50:16 MDT 2009
I'm currently working on getting hadoop running in scheduler mode with
torque, and basically need a shared memory node allocation. By this,
I mean when the program requests -nodes=4, they mean 4 unique nodes
with all processors in those nodes allocated, AND ideally the
generated machine file only containing one entry for each node.
Unfortunately, I am not able to modify how it requests nodes (such as
make it use the :np=8 option), so when it requests --nodes=4, it needs
4 physically seperate nodes. I tried a few ways to "outsmart" hadoop,
but all without success.
I also see this as required for running hybrid MPI/OpenMP jobs. When
I ran such jobs, I want my MPI stack to only start one process per
physical node, but then have OpenMP run on "lightweight" threads to
use all the cores on that system. I can do the -nodes=4:np=8 in this
case, but the generated machines file that OpenMPI gets tells it it
has 32 nodes in this case, so it would start 32 executables, 8 on each
node, when I actually only want 4 executables started.
More information about the torqueusers