[torqueusers] Torque and OpenMPI
Gijsbert Wiesenekker
gijsbert.wiesenekker at gmail.com
Tue Jan 13 04:30:52 MST 2009
I have built a two-node Linux cluster with a Quad Core CPU each, running
Fedora Core 10, Torque and OpenMPI
I have Torque and OpenMPI working on one node such that when I start a
job with
mpiexec -n 4 a.out
It runs 4 copies of a.out on one node.
(BTW, I got the error:
[hostname:01936] [0,0,0] ORTE_ERROR_LOG: File open failure in file
ras_tm_module.c at line 173
[hostname:01936] pls:tm: failed to poll for a spawned proc, return
status = 17002
[hostname:01936] [0,0,0] ORTE_ERROR_LOG: In errno in file rmgr_urm.c at
line 462
[hostname:01936] mpiexec: spawn failed with errno=-11
After trying all kinds of combinations of server parameters, queue
parameters and qsub parameters it turned out that I had to add the
following line to my queue definition: set queue long
resources_default.nodes = 1)
My question is how I can configure Torque such that when I start my
program with
mpiexec -n 8 a.out
It starts the job on each node running 4 copies of a.out each.
Regards,
Gijsbert
More information about the torqueusers
mailing list