[torqueusers] Torque and OpenMPI

Gijsbert Wiesenekker gijsbert.wiesenekker at gmail.com
Tue Jan 13 04:30:52 MST 2009


I have built a two-node Linux cluster with a Quad Core CPU each, running 
Fedora Core 10, Torque and OpenMPI
I have Torque and OpenMPI working on one node such that when I start a 
job with
mpiexec -n 4 a.out
It runs 4 copies of a.out on one node.
(BTW, I got the error:
[hostname:01936] [0,0,0] ORTE_ERROR_LOG: File open failure in file 
ras_tm_module.c at line 173
[hostname:01936] pls:tm: failed to poll for a spawned proc, return 
status = 17002
[hostname:01936] [0,0,0] ORTE_ERROR_LOG: In errno in file rmgr_urm.c at 
line 462
[hostname:01936] mpiexec: spawn failed with errno=-11
After trying all kinds of combinations of server parameters, queue 
parameters and qsub parameters it turned out that I had to add the 
following line to my queue definition: set queue long 
resources_default.nodes = 1)

My question is how I can configure Torque such that when I start my 
program with
mpiexec -n 8 a.out
It starts the job on each node running 4 copies of a.out each.

Regards,
Gijsbert



More information about the torqueusers mailing list