[torqueusers] qsub ... queue hang

Zhiliang Hu zhu at iastate.edu
Wed Dec 5 09:34:55 MST 2007


At 11:37 PM 12/4/2007, Garrick Staples wrote:

>>> sh run  | qsub
>>> 49.cluster2.xxxx.xxxxxxx.xxx
>>>
>>> -- it hangs there forever:
>
>'sh run' is executing and qsub is waiting for it to exit so it can submit the
>output as a job.
>
>I think you want 'echo sh run | qsub'.

That makes sense!

Now I tried:
 > qsub run
-- It sends the "run" job to the queue and stays there (hang).

 > qsub -l nodes=6 run
-- It sends the "run" job to the queue, and took a little while to disappear from the queue (it worked! :-).  But I don't see anything back.  I then tried another "run" job in which it directs output to a file:

  /opt/openmpi.gcc/bin/mpirun -np 12 -machinefile ./machines
    /usr/local/bin/mpiblast -p blastn
    -i /home/zhu/tests/mpiblast/datain4
    -d bta.genome.chr
    -o /home/zhu/tests/mpiblast/out

However this job did appear on, and then disappear from, the queue; but I don't see output anywhere (Note: the script runs well without "qsub").

This brings up a few more questions:

1. Of course -- where does the output go?

2. It appears "qsub" requires to know the number of nodes 
   to run the job. However the "miprun" also requires so.

 - I can use a "machinefile" to tell "mpirun" which node to use; 
   How can I do similar to "qsub"?
 - I have 2 processors on each node so I can specify "-np 12"
   to tell "mpirun" to fire up 12 processes on 6 nodes.
   How can I let "qsub" know the same info?

These questions may appear simple to experts but I have a hard time to abstract useful information from a few Turque tutorial web sites.  Any hint would be appreciated...

Thanks in advance!

Zhiliang 



More information about the torqueusers mailing list