[torqueusers] Torque and OpenMPI

Gijsbert Wiesenekker gijsbert.wiesenekker at gmail.com
Tue Jan 13 14:00:36 MST 2009


Brett Lee wrote:
> Gijsbert,  I believe Glen's suggestion answers your question.  Am 
> still learning myself, so at the risk of being wrong I'll direct you 
> to some MPI/OpenMP examples I've pieced together:
>
> http://www.etpenguin.com/pub/Clustering/HPC/Development/UserHome/pbs/scripts/ 
>
> -Brett
>
> Glen Beane wrote:
>> what does your torque script look like?  You should be specifying the
>> number of nodes.
>>
>> e.g. something like this:
>>
>> #!/bin/bash
>>
>> #PBS -l nodes=2:ppn=4
>>
>> cd $PBS_O_WORKDIR
>> mpiexec -n 8 ./a.out
>>
>> On Tue, Jan 13, 2009 at 6:30 AM, Gijsbert Wiesenekker
>> <gijsbert.wiesenekker at gmail.com> wrote:
>>> I have built a two-node Linux cluster with a Quad Core CPU each, 
>>> running
>>> Fedora Core 10, Torque and OpenMPI
>>> I have Torque and OpenMPI working on one node such that when I start 
>>> a job
>>> with
>>> mpiexec -n 4 a.out
>>> It runs 4 copies of a.out on one node.
>>> (BTW, I got the error:
>>> [hostname:01936] [0,0,0] ORTE_ERROR_LOG: File open failure in file
>>> ras_tm_module.c at line 173
>>> [hostname:01936] pls:tm: failed to poll for a spawned proc, return 
>>> status =
>>> 17002
>>> [hostname:01936] [0,0,0] ORTE_ERROR_LOG: In errno in file rmgr_urm.c 
>>> at line
>>> 462
>>> [hostname:01936] mpiexec: spawn failed with errno=-11
>>> After trying all kinds of combinations of server parameters, queue
>>> parameters and qsub parameters it turned out that I had to add the 
>>> following
>>> line to my queue definition: set queue long resources_default.nodes 
>>> = 1)
>>>
>>> My question is how I can configure Torque such that when I start my 
>>> program
>>> with
>>> mpiexec -n 8 a.out
>>> It starts the job on each node running 4 copies of a.out each.
>>>
>>> Regards,
>>> Gijsbert
>>>
>>> _______________________________________________
>>> torqueusers mailing list
>>> torqueusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>>
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>
OK. So I don't have to change the Torque queue definitions? I was 
thinking that I had to change a queue definition in some way so that it 
was aware of the two nodes.
Does anyone know how this works? I submit a request to the queue of the 
first node, then  Torque checks the PBS_NODESFILE to start the request 
on the second node? In which queue is the request on the second node placed?

Regards,
Gijsbert






More information about the torqueusers mailing list