[torqueusers] Batch not running : Things to check

Sreedhar Manchu sm4082 at nyu.edu
Wed Sep 14 11:24:56 MDT 2011


Hi Dave,

This line says which directories from the host should be staged on to the compute node's destination directory. This is what I found from torque documentation. I included the link to the page below. Hopefully, this helps.


$usecp Format: <HOST>:<SRCDIR> <DSTDIR> Description: Specifies which directories should be staged (see TORQUE Data Management) Example: $usecp *.fte.com:/data /usr/local/data

http://www.clusterresources.com/torquedocs21/a.cmomconfig.shtml

Best,
Sreedhar.

On Sep 14, 2011, at 1:18 PM, Zarnoch, Dave wrote:

> Sreedhar.
> 
>  
> 
> Thanks for your suggestion!
> 
>  
> 
> Just a question….
> 
> The second line:
> 
> $usecp crunch.its.nyu.edu:/home /home
>  
> 
> Is this because the script that I’m running is located in /home or is the location “/home” used for something else?
> 
>  
> 
> Thanks!
> 
>  
> 
> Dave
> 
>  
> 
> Dave Zarnoch
> 
> UNIX Systems Administration
> 
> (215)200-0911
> 
> Dave.Zarnoch at sykes.com
> 
>  
> 
> From: torqueusers-bounces at supercluster.org [mailto:torqueusers-bounces at supercluster.org] On Behalf Of Sreedhar Manchu
> Sent: Wednesday, September 14, 2011 1:11 PM
> To: Torque Users Mailing List
> Subject: Re: [torqueusers] Batch not running : Things to check
>  
> 
> Hi Dave,
> 
>  
> 
> This is what I have in my config file.
> 
>  
> 
> $pbsserver crunch.local
> 
> $usecp crunch.its.nyu.edu:/home /home
> 
> $spool_as_final_name true
> 
>  
> 
> I think you need to mention the second line.
> 
>  
> 
> Best,
> 
> Sreedhar.
> 
>  
> 
>  
> 
>  
> 
> On Sep 14, 2011, at 12:48 PM, Zarnoch, Dave wrote:
> 
> 
> 
> 
> James,
> 
>  
> 
> I tried entering:
> 
> qsub -V –I  -l nodes=1 -q dn
> 
> and it just hangs there
> 
>  
> 
> Do I have a problem with “mom”?
> 
> Here’s some files in mom_priv:
> 
>  
> 
> usphl1ora002@/var/spool/torque/mom_priv>ls -l jobs
> total 0
>  
> usphl1ora002@/var/spool/torque/mom_priv>more config
> $pbsserver      usphl1ora002.amer.sykes.com      # note: hostname running pbs_server
> $logevent       255     # bitmap of which events to log
>  
> 
> usphl1ora002@/var/spool/torque/mom_priv>more mom.lock
> 25994
>  
> usphl1ora002@/var/spool/torque/mom_priv>ps -ef | grep 25994 | grep -v grep
> root     25994     1  0 Sep12 ?        00:01:03 /usr/local/sbin/pbs_mom -p
>  
> 
> Not really familiar with “mom”
> 
>  
> 
> I also don’t have a lot of documentation on Torque…
> 
> Do you know of any good web pages?
> 
> Thanks!
> 
> Dave
> 
>  
> 
> Dave Zarnoch
> 
> UNIX Systems Administration
> 
> (215)200-0911
> 
> Dave.Zarnoch at sykes.com
> 
>  
> 
> From: torqueusers-bounces at supercluster.org [mailto:torqueusers-bounces at supercluster.org] On Behalf Of Coyle, James J [ITACD]
> Sent: Wednesday, September 14, 2011 12:00 PM
> To: Torque Users Mailing List
> Subject: Re: [torqueusers] Batch not running : Things to check
>  
> 
> Dave,
> 
>  
> 
>   Welcome to Torque. I switched from NQS some time ago, and Torque/PBS has been a good replacement for me.
> 
> Things to check:
> 
>   What does the error output say? ( Probably in file dn_test.txt.e[0-9]* )
> 
>   Permissions on  /home/zarnocda/torque/scripts_test/dn_test.sh , is it executable? 
> 
> You may need:    chmod u+x /home/zarnocda/torque/scripts_test/dn_test.sh
> 
>  
> 
>   I’d also check if /home/zarnocda/torque/scripts_test/dn_test.sh
> 
> even exists on the compute node.
> 
> e.g.  ls /home/zarnocda/torque/scripts_test/dn_test.sh  executable
> 
>  
> 
>   I usually use the interactive opion ( qsub –I ) to debug these kinds of problems.
> 
> You could issue:
> 
> qsub -V –I  -l nodes=1 -q dn
> 
> which will start an interactive jobs and log you into the mother superior node for that job
> 
> where you can then try issuing the commands within your job that is not working.
> 
>  
> 
> James Coyle, PhD
> High Performance Computing Group       
>  Iowa State Univ.         
> web: http://jjc.public.iastate.edu/
>  
> 
>  
> 
>  
> 
> From: torqueusers-bounces at supercluster.org [mailto:torqueusers-bounces at supercluster.org] On Behalf Of Zarnoch, Dave
> Sent: Wednesday, September 14, 2011 8:42 AM
> To: torqueusers at supercluster.org
> Subject: [torqueusers] Batch not running
> Importance: High
>  
> 
> Hello folks,
> 
> New to Torque, used to run NQS….
> 
> Concerning Torque…
> 
> I have a small script:
> 
> $ more dn_test.sh
> #!/bin/sh
> #
> PATH=/bin:/usr/bin:/usr/local/bin:/etc:/usr/sbin:/usr/ucb:$HOME/bin:/usr/bin/X11
> :/sbin:.
> export PATH
> DATE=`date +%H%M`
> echo "Hello"
> touch /tmp/dn_test_${DATE}
> sleep 90
>  
> When I submit the script:
>  
> qsub -V -l nodes=1 -q dn dn_test.sh
>  
> It runs fine.
>  
> But I need to run batch…
>  
> I created a text file “dn_test.txt”
>  
> That contains:
>  
> /home/zarnocda/torque/scripts_test/dn_test.sh
>  
> When I run:
>  
> qsub -V -l nodes=1 –q dn dn_test.txt
>  
> 
> It appears to process the file:
> 
> qstat –s
> 
> Job id                    Name             User            Time Use S Queue
> 
> ------------------------- ---------------- --------------- -------- - -----
> 
> 7592.usphl1ora002.amer    dn_test.txt      zarnocda               0 R dn       
> 
>  
> 
> But it doesn’t excute the script within:
> 
> /home/zarnocda/torque/scripts_test/dn_test.sh
>  
> 
> Any help!
> 
>  
> 
> Thanks!
> 
>  
> 
> Dave
> 
> Dave Zarnoch
> 
> UNIX Systems Administration
> 
> (215)200-0911
> 
> Dave.Zarnoch at sykes.com
> 
>  
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
> 
>  
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20110914/74a4a532/attachment-0001.html 


More information about the torqueusers mailing list