[torqueusers] qsub ... queue hang

Glen Beane glen.beane at gmail.com
Wed Dec 5 10:17:06 MST 2007


On Dec 5, 2007 11:34 AM, Zhiliang Hu <zhu at iastate.edu> wrote:
> At 11:37 PM 12/4/2007, Garrick Staples wrote:
>
> >>> sh run  | qsub
> >>> 49.cluster2.xxxx.xxxxxxx.xxx
> >>>
> >>> -- it hangs there forever:
> >
> >'sh run' is executing and qsub is waiting for it to exit so it can submit the
> >output as a job.
> >
> >I think you want 'echo sh run | qsub'.
>
> That makes sense!
>
> Now I tried:
>  > qsub run
> -- It sends the "run" job to the queue and stays there (hang).
>
>  > qsub -l nodes=6 run
> -- It sends the "run" job to the queue, and took a little while to disappear from the queue (it worked! :-).  But I don't see anything back.  I then tried another "run" job in which it directs output to a file:
>
>   /opt/openmpi.gcc/bin/mpirun -np 12 -machinefile ./machines
>     /usr/local/bin/mpiblast -p blastn
>     -i /home/zhu/tests/mpiblast/datain4
>     -d bta.genome.chr
>     -o /home/zhu/tests/mpiblast/out
>
> However this job did appear on, and then disappear from, the queue; but I don't see output anywhere (Note: the script runs well without "qsub").
>
> This brings up a few more questions:
>
> 1. Of course -- where does the output go?

after the job runs a job_name.ojob_num file should be created with the
stdout, and a job_name.ejob_num file should be created with the
stderr.  If those don't show up you need to find out why - it could be
a problem with your pbs_mom configuration  (look at the mom_logs)

> 2. It appears "qsub" requires to know the number of nodes
>    to run the job. However the "miprun" also requires so.
>
>  - I can use a "machinefile" to tell "mpirun" which node to use;
>    How can I do similar to "qsub"?

build openMPI with TM support, and you won't need to pass it a machine
file - it will use pbs_mom  to launch the remote processes.  Look at
the OpenMPI documentation about integrating with OpenPBS, PBS Pro, and
TORQUE


>  - I have 2 processors on each node so I can specify "-np 12"
>    to tell "mpirun" to fire up 12 processes on 6 nodes.
>    How can I let "qsub" know the same info?

qsub -l nodes=6:ppn=2 or #PBS -l nodes=6:ppn=2 in your script


More information about the torqueusers mailing list