[torqueusers] qsub ... queue hang
glen.beane at gmail.com
Wed Dec 5 10:17:06 MST 2007
On Dec 5, 2007 11:34 AM, Zhiliang Hu <zhu at iastate.edu> wrote:
> At 11:37 PM 12/4/2007, Garrick Staples wrote:
> >>> sh run | qsub
> >>> 49.cluster2.xxxx.xxxxxxx.xxx
> >>> -- it hangs there forever:
> >'sh run' is executing and qsub is waiting for it to exit so it can submit the
> >output as a job.
> >I think you want 'echo sh run | qsub'.
> That makes sense!
> Now I tried:
> > qsub run
> -- It sends the "run" job to the queue and stays there (hang).
> > qsub -l nodes=6 run
> -- It sends the "run" job to the queue, and took a little while to disappear from the queue (it worked! :-). But I don't see anything back. I then tried another "run" job in which it directs output to a file:
> /opt/openmpi.gcc/bin/mpirun -np 12 -machinefile ./machines
> /usr/local/bin/mpiblast -p blastn
> -i /home/zhu/tests/mpiblast/datain4
> -d bta.genome.chr
> -o /home/zhu/tests/mpiblast/out
> However this job did appear on, and then disappear from, the queue; but I don't see output anywhere (Note: the script runs well without "qsub").
> This brings up a few more questions:
> 1. Of course -- where does the output go?
after the job runs a job_name.ojob_num file should be created with the
stdout, and a job_name.ejob_num file should be created with the
stderr. If those don't show up you need to find out why - it could be
a problem with your pbs_mom configuration (look at the mom_logs)
> 2. It appears "qsub" requires to know the number of nodes
> to run the job. However the "miprun" also requires so.
> - I can use a "machinefile" to tell "mpirun" which node to use;
> How can I do similar to "qsub"?
build openMPI with TM support, and you won't need to pass it a machine
file - it will use pbs_mom to launch the remote processes. Look at
the OpenMPI documentation about integrating with OpenPBS, PBS Pro, and
> - I have 2 processors on each node so I can specify "-np 12"
> to tell "mpirun" to fire up 12 processes on 6 nodes.
> How can I let "qsub" know the same info?
qsub -l nodes=6:ppn=2 or #PBS -l nodes=6:ppn=2 in your script
More information about the torqueusers