[torqueusers] qsub syntax for MPI Jobs

Aquil H. Abdullah aabdullah at interactivesupercomputing.com
Mon Jan 28 12:35:01 MST 2008


Hello All,
  I have recently installed torque 2.1.8 on a two node cluster and I
have a question about submitting MPI Jobs. When I submit a job with the
following syntax.

qsub -l nodes=2:ppn=4 -N StarpJob -V  -o /home/aha/.starp/.log/2008_01_
28_1407_13/pbs_stdout
-e /home/aha/.starp/.log/2008_01_28_1407_13/pbs_stderr

And use a submission script(via STDIN) like
/home/aha/starp/2.6.0_836
4/bin/bin_launcher -using mpirun -1
--file=/home/aha/.starp/.log/2008_01_28_1407_13/machine_file -np 8
-r /home/aha/starp/2.6.0_8364/bin/mpi_ssh_wrapper
 /home/aha/starp/2.6.0_8364/bin/starpserver . /home/aha/starp/2.6.0_8364/opteron_linux/hpc_server/lib /home/aha/starp/2.6.0_8364/opteron_linux/hpc_serve
r/lib -t 120 -k /home/aha/.starp/.tmp/2008_01_28_1407_13/starpmtpQ4K.txt

I get the following error in my one of my logfiles file:
mpiexec: unable to start all procs; may have invalid machine names
    remaining specified hosts:
        10.0.1.68 (orga1.isc-dev.com)

However if I modify the job submission command as follows:

qsub -l nodes=2 -N StarpJob -V  -o /home/aha/.starp/.log/2008_01_
28_1407_13/pbs_stdout
-e /home/aha/.starp/.log/2008_01_28_1407_13/pbs_stderr

The job submits just fine.

In the first case the machine file looks like:
orga2
orga2
orga2
orga2
orga1
orga1
orga1
orga1

In the second case the machine file looks like:
orga2
orga1

Hardware Configuration: Two dual-socket dual-core Xeon systems (Each box
has 4 cores)

Above I am using the Intel MPI, however I have tried using the OSC
mpiexec implementation that can be integrated with Torque so that MPI
processes are spawned instead of exec'ed via ssh or rsh, and I have run
into a similar problem.

NOTE: I can also start a job on one node with the syntax:
qsub -l nodes=1:ppn=4

Any suggestions or illumination about why I can not submit a job to run
on both nodes using the ppn resource specification?

Thanks!

-- 
Aquil H. Abdullah
Application Engineer
Interactive Supercomputing

P: +1 781 419 5051
E: aabdullah at interactivesupercomputing.com
W: http://www.interactivesupercomputing.com


More information about the torqueusers mailing list