[torqueusers] qsub syntax for MPI Jobs
Aquil H. Abdullah
aabdullah at interactivesupercomputing.com
Mon Jan 28 12:35:01 MST 2008
Hello All,
I have recently installed torque 2.1.8 on a two node cluster and I
have a question about submitting MPI Jobs. When I submit a job with the
following syntax.
qsub -l nodes=2:ppn=4 -N StarpJob -V -o /home/aha/.starp/.log/2008_01_
28_1407_13/pbs_stdout
-e /home/aha/.starp/.log/2008_01_28_1407_13/pbs_stderr
And use a submission script(via STDIN) like
/home/aha/starp/2.6.0_836
4/bin/bin_launcher -using mpirun -1
--file=/home/aha/.starp/.log/2008_01_28_1407_13/machine_file -np 8
-r /home/aha/starp/2.6.0_8364/bin/mpi_ssh_wrapper
/home/aha/starp/2.6.0_8364/bin/starpserver . /home/aha/starp/2.6.0_8364/opteron_linux/hpc_server/lib /home/aha/starp/2.6.0_8364/opteron_linux/hpc_serve
r/lib -t 120 -k /home/aha/.starp/.tmp/2008_01_28_1407_13/starpmtpQ4K.txt
I get the following error in my one of my logfiles file:
mpiexec: unable to start all procs; may have invalid machine names
remaining specified hosts:
10.0.1.68 (orga1.isc-dev.com)
However if I modify the job submission command as follows:
qsub -l nodes=2 -N StarpJob -V -o /home/aha/.starp/.log/2008_01_
28_1407_13/pbs_stdout
-e /home/aha/.starp/.log/2008_01_28_1407_13/pbs_stderr
The job submits just fine.
In the first case the machine file looks like:
orga2
orga2
orga2
orga2
orga1
orga1
orga1
orga1
In the second case the machine file looks like:
orga2
orga1
Hardware Configuration: Two dual-socket dual-core Xeon systems (Each box
has 4 cores)
Above I am using the Intel MPI, however I have tried using the OSC
mpiexec implementation that can be integrated with Torque so that MPI
processes are spawned instead of exec'ed via ssh or rsh, and I have run
into a similar problem.
NOTE: I can also start a job on one node with the syntax:
qsub -l nodes=1:ppn=4
Any suggestions or illumination about why I can not submit a job to run
on both nodes using the ppn resource specification?
Thanks!
--
Aquil H. Abdullah
Application Engineer
Interactive Supercomputing
P: +1 781 419 5051
E: aabdullah at interactivesupercomputing.com
W: http://www.interactivesupercomputing.com
More information about the torqueusers
mailing list