[torqueusers] Batch not running : Things to check
Sreedhar Manchu
sm4082 at nyu.edu
Wed Sep 14 11:24:56 MDT 2011
Hi Dave,
This line says which directories from the host should be staged on to the compute node's destination directory. This is what I found from torque documentation. I included the link to the page below. Hopefully, this helps.
$usecp Format: <HOST>:<SRCDIR> <DSTDIR> Description: Specifies which directories should be staged (see TORQUE Data Management) Example: $usecp *.fte.com:/data /usr/local/data
http://www.clusterresources.com/torquedocs21/a.cmomconfig.shtml
Best,
Sreedhar.
On Sep 14, 2011, at 1:18 PM, Zarnoch, Dave wrote:
> Sreedhar.
>
>
>
> Thanks for your suggestion!
>
>
>
> Just a question….
>
> The second line:
>
> $usecp crunch.its.nyu.edu:/home /home
>
>
> Is this because the script that I’m running is located in /home or is the location “/home” used for something else?
>
>
>
> Thanks!
>
>
>
> Dave
>
>
>
> Dave Zarnoch
>
> UNIX Systems Administration
>
> (215)200-0911
>
> Dave.Zarnoch at sykes.com
>
>
>
> From: torqueusers-bounces at supercluster.org [mailto:torqueusers-bounces at supercluster.org] On Behalf Of Sreedhar Manchu
> Sent: Wednesday, September 14, 2011 1:11 PM
> To: Torque Users Mailing List
> Subject: Re: [torqueusers] Batch not running : Things to check
>
>
> Hi Dave,
>
>
>
> This is what I have in my config file.
>
>
>
> $pbsserver crunch.local
>
> $usecp crunch.its.nyu.edu:/home /home
>
> $spool_as_final_name true
>
>
>
> I think you need to mention the second line.
>
>
>
> Best,
>
> Sreedhar.
>
>
>
>
>
>
>
> On Sep 14, 2011, at 12:48 PM, Zarnoch, Dave wrote:
>
>
>
>
> James,
>
>
>
> I tried entering:
>
> qsub -V –I -l nodes=1 -q dn
>
> and it just hangs there
>
>
>
> Do I have a problem with “mom”?
>
> Here’s some files in mom_priv:
>
>
>
> usphl1ora002@/var/spool/torque/mom_priv>ls -l jobs
> total 0
>
> usphl1ora002@/var/spool/torque/mom_priv>more config
> $pbsserver usphl1ora002.amer.sykes.com # note: hostname running pbs_server
> $logevent 255 # bitmap of which events to log
>
>
> usphl1ora002@/var/spool/torque/mom_priv>more mom.lock
> 25994
>
> usphl1ora002@/var/spool/torque/mom_priv>ps -ef | grep 25994 | grep -v grep
> root 25994 1 0 Sep12 ? 00:01:03 /usr/local/sbin/pbs_mom -p
>
>
> Not really familiar with “mom”
>
>
>
> I also don’t have a lot of documentation on Torque…
>
> Do you know of any good web pages?
>
> Thanks!
>
> Dave
>
>
>
> Dave Zarnoch
>
> UNIX Systems Administration
>
> (215)200-0911
>
> Dave.Zarnoch at sykes.com
>
>
>
> From: torqueusers-bounces at supercluster.org [mailto:torqueusers-bounces at supercluster.org] On Behalf Of Coyle, James J [ITACD]
> Sent: Wednesday, September 14, 2011 12:00 PM
> To: Torque Users Mailing List
> Subject: Re: [torqueusers] Batch not running : Things to check
>
>
> Dave,
>
>
>
> Welcome to Torque. I switched from NQS some time ago, and Torque/PBS has been a good replacement for me.
>
> Things to check:
>
> What does the error output say? ( Probably in file dn_test.txt.e[0-9]* )
>
> Permissions on /home/zarnocda/torque/scripts_test/dn_test.sh , is it executable?
>
> You may need: chmod u+x /home/zarnocda/torque/scripts_test/dn_test.sh
>
>
>
> I’d also check if /home/zarnocda/torque/scripts_test/dn_test.sh
>
> even exists on the compute node.
>
> e.g. ls /home/zarnocda/torque/scripts_test/dn_test.sh executable
>
>
>
> I usually use the interactive opion ( qsub –I ) to debug these kinds of problems.
>
> You could issue:
>
> qsub -V –I -l nodes=1 -q dn
>
> which will start an interactive jobs and log you into the mother superior node for that job
>
> where you can then try issuing the commands within your job that is not working.
>
>
>
> James Coyle, PhD
> High Performance Computing Group
> Iowa State Univ.
> web: http://jjc.public.iastate.edu/
>
>
>
>
>
>
> From: torqueusers-bounces at supercluster.org [mailto:torqueusers-bounces at supercluster.org] On Behalf Of Zarnoch, Dave
> Sent: Wednesday, September 14, 2011 8:42 AM
> To: torqueusers at supercluster.org
> Subject: [torqueusers] Batch not running
> Importance: High
>
>
> Hello folks,
>
> New to Torque, used to run NQS….
>
> Concerning Torque…
>
> I have a small script:
>
> $ more dn_test.sh
> #!/bin/sh
> #
> PATH=/bin:/usr/bin:/usr/local/bin:/etc:/usr/sbin:/usr/ucb:$HOME/bin:/usr/bin/X11
> :/sbin:.
> export PATH
> DATE=`date +%H%M`
> echo "Hello"
> touch /tmp/dn_test_${DATE}
> sleep 90
>
> When I submit the script:
>
> qsub -V -l nodes=1 -q dn dn_test.sh
>
> It runs fine.
>
> But I need to run batch…
>
> I created a text file “dn_test.txt”
>
> That contains:
>
> /home/zarnocda/torque/scripts_test/dn_test.sh
>
> When I run:
>
> qsub -V -l nodes=1 –q dn dn_test.txt
>
>
> It appears to process the file:
>
> qstat –s
>
> Job id Name User Time Use S Queue
>
> ------------------------- ---------------- --------------- -------- - -----
>
> 7592.usphl1ora002.amer dn_test.txt zarnocda 0 R dn
>
>
>
> But it doesn’t excute the script within:
>
> /home/zarnocda/torque/scripts_test/dn_test.sh
>
>
> Any help!
>
>
>
> Thanks!
>
>
>
> Dave
>
> Dave Zarnoch
>
> UNIX Systems Administration
>
> (215)200-0911
>
> Dave.Zarnoch at sykes.com
>
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20110914/74a4a532/attachment-0001.html
More information about the torqueusers
mailing list