[torqueusers] qsub/mpirun problems

Steve Young chemadm at hamilton.edu
Wed Sep 17 20:17:53 MDT 2008


On our set up we need to use mpiexec not mpirun. It mostly depends on  
what installation of mpi you have. Each have their own quirks ;-).


On Sep 17, 2008, at 10:06 PM, Zhiliang Hu wrote:

> Sorry for cross posting -- I didn't get the problem solved on other  
> lists:
>
> We are running a Linux CentOS 8-node cluster. When "qsub" a mpiblast  
> job, I came to this dilemma: what's the correct way to supply the  
> nodes information: to "qsub" (-l nodes=6:ppn=2)? or to "mpirun" (-np  
> 12 -machinefile /path/to/mpimachines)?  Or both? --- they all failed  
> in my trials (details below).
>
> Any advice it appreciated.
>
> Zhiliang
>
>
> ps: My trials (they all on one-line; I break them down for visual  
> purpose):
>
> (1)
> The following mpiblast runs fine on our CentOS cluster:
> ------------------------------------------------------
> /path/to/bin/mpirun -np 12 -machinefile /path/to/mpimachines
>    /path/to/mpiblast
>      -p blastn
>      -d seq.db
>      -i /path/to/input.seq
>      -o /path/to/output.txt
> ------------------------------------------------------
>
> (2)
> When I try to send the job with 'qsub', it has problems:
> --------------------------------------
> qsub -l nodes=6:ppn=2
>     -e /path/to/locationA
>     -o /path/to/locationA
>     /path/to/program
>
>  where "program" is:
>
>  /path/to/bin/mpirun
>    /path/to/mpiblast
>      -p blastn
>      -d seq.db
>      -i /path/to/input.seq
>      -o /path/to/output.txt
> --------------------------------------
> The torque's "..ER" file says: "Sorry, mpiBLAST must be run on 3 or  
> more nodes". (Also in the node's /undeliverred/ errors).
>
> A SIDE NOTE: This worked before on this machine but for some weird  
> reason it is failing now.
>
>
> (3)
> But if I specify node info like in:
> --------------------------------------
> qsub -l nodes=6:ppn=2
>     -e /path/to/locationA
>     -o /path/to/locationA
>     /path/to/program
>
>  where "program" is:
>
>  /path/to/bin/mpirun -np 12 -machinefile /path/to/mpimachines
>    /path/to/mpiblast
>      -p blastn
>      -d seq.db
>      -i /path/to/input.seq
>      -o /path/to/output.txt
> --------------------------------------
> It fails with error: "pls:tm: failed to poll for a spawned proc,  
> return status = 17002".
>
> -- what's the proper way to queue mpiblast jobs?
>
> Zhiliang
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list