[torqueusers] qsub/mpirun problems
Steve Young
chemadm at hamilton.edu
Wed Sep 17 20:17:53 MDT 2008
On our set up we need to use mpiexec not mpirun. It mostly depends on
what installation of mpi you have. Each have their own quirks ;-).
On Sep 17, 2008, at 10:06 PM, Zhiliang Hu wrote:
> Sorry for cross posting -- I didn't get the problem solved on other
> lists:
>
> We are running a Linux CentOS 8-node cluster. When "qsub" a mpiblast
> job, I came to this dilemma: what's the correct way to supply the
> nodes information: to "qsub" (-l nodes=6:ppn=2)? or to "mpirun" (-np
> 12 -machinefile /path/to/mpimachines)? Or both? --- they all failed
> in my trials (details below).
>
> Any advice it appreciated.
>
> Zhiliang
>
>
> ps: My trials (they all on one-line; I break them down for visual
> purpose):
>
> (1)
> The following mpiblast runs fine on our CentOS cluster:
> ------------------------------------------------------
> /path/to/bin/mpirun -np 12 -machinefile /path/to/mpimachines
> /path/to/mpiblast
> -p blastn
> -d seq.db
> -i /path/to/input.seq
> -o /path/to/output.txt
> ------------------------------------------------------
>
> (2)
> When I try to send the job with 'qsub', it has problems:
> --------------------------------------
> qsub -l nodes=6:ppn=2
> -e /path/to/locationA
> -o /path/to/locationA
> /path/to/program
>
> where "program" is:
>
> /path/to/bin/mpirun
> /path/to/mpiblast
> -p blastn
> -d seq.db
> -i /path/to/input.seq
> -o /path/to/output.txt
> --------------------------------------
> The torque's "..ER" file says: "Sorry, mpiBLAST must be run on 3 or
> more nodes". (Also in the node's /undeliverred/ errors).
>
> A SIDE NOTE: This worked before on this machine but for some weird
> reason it is failing now.
>
>
> (3)
> But if I specify node info like in:
> --------------------------------------
> qsub -l nodes=6:ppn=2
> -e /path/to/locationA
> -o /path/to/locationA
> /path/to/program
>
> where "program" is:
>
> /path/to/bin/mpirun -np 12 -machinefile /path/to/mpimachines
> /path/to/mpiblast
> -p blastn
> -d seq.db
> -i /path/to/input.seq
> -o /path/to/output.txt
> --------------------------------------
> It fails with error: "pls:tm: failed to poll for a spawned proc,
> return status = 17002".
>
> -- what's the proper way to queue mpiblast jobs?
>
> Zhiliang
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
More information about the torqueusers
mailing list