[torqueusers] qsub/mpirun problems

Zhiliang Hu zhu at iastate.edu
Wed Sep 17 20:06:15 MDT 2008


Sorry for cross posting -- I didn't get the problem solved on other lists:

We are running a Linux CentOS 8-node cluster. When "qsub" a mpiblast job, I came to this dilemma: what's the correct way to supply the nodes information: to "qsub" (-l nodes=6:ppn=2)? or to "mpirun" (-np 12 -machinefile /path/to/mpimachines)?  Or both? --- they all failed in my trials (details below).

Any advice it appreciated.

Zhiliang


ps: My trials (they all on one-line; I break them down for visual purpose):

(1) 
The following mpiblast runs fine on our CentOS cluster:
------------------------------------------------------
 /path/to/bin/mpirun -np 12 -machinefile /path/to/mpimachines 
    /path/to/mpiblast
      -p blastn
      -d seq.db
      -i /path/to/input.seq 
      -o /path/to/output.txt
------------------------------------------------------

(2)
When I try to send the job with 'qsub', it has problems:
--------------------------------------
qsub -l nodes=6:ppn=2
     -e /path/to/locationA
     -o /path/to/locationA
     /path/to/program

  where "program" is:

  /path/to/bin/mpirun
    /path/to/mpiblast
      -p blastn
      -d seq.db
      -i /path/to/input.seq 
      -o /path/to/output.txt
--------------------------------------
The torque's "..ER" file says: "Sorry, mpiBLAST must be run on 3 or more nodes". (Also in the node's /undeliverred/ errors).

A SIDE NOTE: This worked before on this machine but for some weird reason it is failing now.


(3)
But if I specify node info like in:
--------------------------------------
qsub -l nodes=6:ppn=2
     -e /path/to/locationA
     -o /path/to/locationA
     /path/to/program

  where "program" is:

  /path/to/bin/mpirun -np 12 -machinefile /path/to/mpimachines 
    /path/to/mpiblast
      -p blastn
      -d seq.db
      -i /path/to/input.seq 
      -o /path/to/output.txt
--------------------------------------
It fails with error: "pls:tm: failed to poll for a spawned proc, return status = 17002".

-- what's the proper way to queue mpiblast jobs?

Zhiliang



More information about the torqueusers mailing list