[torqueusers] Re: Newbie torque script questions

dave first linux4dave at gmail.com
Fri Dec 8 11:21:11 MST 2006


All,

Thanks for ALL your input.  I see I have a lot to learn about the hows and
whys of Torque, and job submissions.  I will look at using mpiexec - thanks
for the link.  I have been using mpirun because that is what I was
introduced to first.

dave

On 12/7/06, Jerry Smith <jdsmit at sandia.gov> wrote:
>
>  Dave,
>
> Try in you pbs_script:
>
> -l nodes=n10:ppn=4+n11:ppn=4+n12:ppn=4+n13:ppn=4
>
> Make sure your $PBS_HOME/server_priv/nodes looks like
>
> n10 np=4
> n11 np=4
> ..
> ..
>
>
> Just a follow up.  Are you wanting to get 4 nodes with 4 processors, and
> use only 1 processor per node?  Your original mpirun line will only ask for
> 4 processors in which to run ( of which n10 has )
>
> If you want to use all processors on all 4 nodes you would want to use –np
> 16.
>
> -nolocal assumes you do not want to run processes on the controlling
> pbs_mom ( n10 in this scenario ) therefore you are really only getting 12/16
> processors.
>
> My other suggestion is to build Pete Wyckoff's mpiexec in place of mpirun,
> as there are many advantages ( usage, differing flags, is built tightly into
> the Torque job spawn  etc. )
> http://www.osc.edu/~pw/mpiexec/index.php<http://www.osc.edu/%7Epw/mpiexec/index.php>
>
>
>
> Jerry Smith
> -----------------------------------
> Sandia national labs
> Infrastructure Computing Systems
>
>
> ------------------------------
> *From: *dave first <linux4dave at gmail.com>
> *Date: *Wed, 6 Dec 2006 09:32:47 -0800
> *To: *<torqueusers at supercluster.org>
> *Subject: *[torqueusers] Re: Newbie torque script questions
>
> New datapoint - I ran the job with a  2 minute sleep, and found the job
> running only on n04, as qstat -f said it would be.
>
> Why wouldn't qsub honor my local node list?
>
> dave
>
> On 12/6/06, *dave first* < linux4dave at gmail.com
> <mailto:linux4dave at gmail.com> <linux4dave at gmail.com> > wrote:
>
> I am such a newbie that I squeek.  I hope this is the correct forum in
> which to ask this question.
>
> I want to specify a nodelist other than that which would be $PBS_NODEFILE.
>  I want to specify n10, n11, n12 and n13, each with 4 processors.  The node
> list looks something like this:
>
> n10:4
> n11:4
> n12:4
> n13:4
>
> And it is called local_nodelist in the working directory.
>
> The script sets PBS_NODEFILE=`pwd`/local_nodelist
>
> qstat -f while running the script elicits what seems to be an erroneous
> nodelist
>
> Job Id: 76.excalibur
>     Job_Name = pbs_mpich.
>     Job_Owner = joeb at excalibur.example.com
>     resources_used.cput = 00:00:00
>     resources_used.mem = 4296kb
>     resources_used.vmem = 175988kb
>     resources_used.walltime = 00:00:12
>     job_state = R
>     queue = default
>     server = excalibur.example.com <http://excalibur.example.com><http://excalibur.example.com>
>     Checkpoint = u
>     ctime = Wed Dec  6 08:54:16 2006
>     Error_Path = excalibur.example.com <http://excalibur.example.com><http://excalibur.example.com>:/home/joeb/pbs_mpich..e76
>     exec_host = n04/0
>     Hold_Types = n
>     Join_Path = n
>     Keep_Files = n
>     Mail_Points = a
>     mtime = Wed Dec  6 08:54:17 2006
>     Output_Path = excalibur.example.com <http://excalibur.example.com><http://excalibur.example.com>:/home/joeb/pbs_mpich..o76
>     Priority = 0
>     qtime = Wed Dec  6 08:54:16 2006
>     Rerunable = True
>     Resource_List.nodect = 1
>     Resource_List.nodes = 1
>     session_id = 31725
>     Variable_List = PBS_O_HOME=/home/joeb,PBS_O_LANG=en_US.UTF-8,
>         PBS_O_LOGNAME=joeb,
>
>         PBS_O_PATH=/opt/torque/bin:/opt/bin:/opt/hdfview/bin:/opt/hdf/bin:/opt
>
>         /ncarg/bin:/opt/mpich/p4-gnu/bin:/opt/mpiexec//bin:/usr/kerberos/bin:/o
>         pt/java/jdk1.5.0/bin:/usr/lib64/ccache/bin:/usr/local/bin:/bin:/usr/bin
>
>
>         :/usr/X11R6/bin:/opt/java/jdk1.5.0/jre/bin:/opt/visit/bin:/home/joeb/bi
>         n:/opt/mpich/p4-gnu/sbin,PBS_O_MAIL=/var/spool/mail/joeb
>         PBS_O_SHELL=/bin/bash,PBS_O_HOST= excalibur.example.com
> <http://excalibur.example.com> <http://excalibur.example.com> ,
>         PBS_O_WORKDIR=/home/joeb,PBS_O_QUEUE=default
>     comment = Job started on Wed Dec 06 at 08:54
>     etime = Wed Dec  6 08:54:16 2006
> ---------------------------------------------------------------------------------
>
>
> However, the script output looks like this:
>
> Job ID: 76.excalibur.example.com <http://76.excalibur.example.com><http://76.excalibur.example.com>
> Working directory is /home/joeb
> Running on host n04.example.com <http://n04.example.com><http://n04.example.com>
> Time is Wed Dec 6 08:54:17 PST 2006
> Directory is /home/joeb
> The node file is /net/fs/home/joeb/local_nodefile
> This job runs on the following processors:
> n09.example.com:4 <http://n09.example.com:4> <http://n09.example.com:4>
> n10.example.com:4 <http://n10.example.com:4> <http://n10.example.com:4>
> n11.example.com:4 <http://n11.example.com:4> <http://n11.example.com:4>
> n12.example.com:4 <http://n12.example.com:4> <http://n12.example.com:4>
> This job has allocated 4 nodes/processors.
>
> /usr/local/bin/mpich/x86_64/p4/gnu/bin/mpirun -nolocal -np 4 -machinefile
> /net/fs/home/joeb/local_nodefile /usr/local/bin/mpich/p
> 4-gnu/examples/cpi
>
> pi is approximately 3.1416009869231249, Error is 0.0000083333333318
> wall clock time = 0.003906
> ---------------------------------------------------------------------------------
>
>
> Can anyone explain why the output of qstat -f and the script echo
> statements differ, and how can I determine which is correct?  (Short of
> sleeping for a while while I look for all the processes?)
>
> Thanks,
> dave
>
>
>
> ------------------------------
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20061208/4e97cb8f/attachment.html


More information about the torqueusers mailing list