[torqueusers] Torque and OpenMPI

Simon Hammond simon.hammond at gmail.com
Tue Jan 13 08:52:26 MST 2009


Can't you submit something like:

qsub -l nodes=2:ppn=4 .....

Are you submitting into the torque queue using qsub?


Si Hammond

Performance Modelling and Analysis Team
High Performance Systems Group
University of Warwick


2009/1/13 Gijsbert Wiesenekker <gijsbert.wiesenekker at gmail.com>:
> I have built a two-node Linux cluster with a Quad Core CPU each, running
> Fedora Core 10, Torque and OpenMPI
> I have Torque and OpenMPI working on one node such that when I start a job
> with
> mpiexec -n 4 a.out
> It runs 4 copies of a.out on one node.
> (BTW, I got the error:
> [hostname:01936] [0,0,0] ORTE_ERROR_LOG: File open failure in file
> ras_tm_module.c at line 173
> [hostname:01936] pls:tm: failed to poll for a spawned proc, return status =
> 17002
> [hostname:01936] [0,0,0] ORTE_ERROR_LOG: In errno in file rmgr_urm.c at line
> 462
> [hostname:01936] mpiexec: spawn failed with errno=-11
> After trying all kinds of combinations of server parameters, queue
> parameters and qsub parameters it turned out that I had to add the following
> line to my queue definition: set queue long resources_default.nodes = 1)
>
> My question is how I can configure Torque such that when I start my program
> with
> mpiexec -n 8 a.out
> It starts the job on each node running 4 copies of a.out each.
>
> Regards,
> Gijsbert
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>


More information about the torqueusers mailing list