[torqueusers] Re: Newbie torque script questions
dave first
linux4dave at gmail.com
Wed Dec 6 10:32:47 MST 2006
New datapoint - I ran the job with a 2 minute sleep, and found the job
running only on n04, as qstat -f said it would be.
Why wouldn't qsub honor my local node list?
dave
On 12/6/06, dave first <linux4dave at gmail.com> wrote:
>
> I am such a newbie that I squeek. I hope this is the correct forum in
> which to ask this question.
>
> I want to specify a nodelist other than that which would be
> $PBS_NODEFILE. I want to specify n10, n11, n12 and n13, each with 4
> processors. The node list looks something like this:
>
> n10:4
> n11:4
> n12:4
> n13:4
>
> And it is called local_nodelist in the working directory.
>
> The script sets PBS_NODEFILE=`pwd`/local_nodelist
>
> qstat -f while running the script elicits what seems to be an erroneous
> nodelist
>
> Job Id: 76.excalibur
> Job_Name = pbs_mpich.
> Job_Owner = joeb at excalibur.example.com
> resources_used.cput = 00:00:00
> resources_used.mem = 4296kb
> resources_used.vmem = 175988kb
> resources_used.walltime = 00:00:12
> job_state = R
> queue = default
> server = excalibur.example.com
> Checkpoint = u
> ctime = Wed Dec 6 08:54:16 2006
> Error_Path = excalibur.example.com:/home/joeb/pbs_mpich..e76
> exec_host = n04/0
> Hold_Types = n
> Join_Path = n
> Keep_Files = n
> Mail_Points = a
> mtime = Wed Dec 6 08:54:17 2006
> Output_Path = excalibur.example.com :/home/joeb/pbs_mpich..o76
> Priority = 0
> qtime = Wed Dec 6 08:54:16 2006
> Rerunable = True
> Resource_List.nodect = 1
> Resource_List.nodes = 1
> session_id = 31725
> Variable_List = PBS_O_HOME=/home/joeb,PBS_O_LANG=en_US.UTF-8,
> PBS_O_LOGNAME=joeb,
>
> PBS_O_PATH=/opt/torque/bin:/opt/bin:/opt/hdfview/bin:/opt/hdf/bin:/opt
>
> /ncarg/bin:/opt/mpich/p4-gnu/bin:/opt/mpiexec//bin:/usr/kerberos/bin:/o
>
> pt/java/jdk1.5.0/bin:/usr/lib64/ccache/bin:/usr/local/bin:/bin:/usr/bin
>
> :/usr/X11R6/bin:/opt/java/jdk1.5.0/jre/bin:/opt/visit/bin:/home/joeb/bi
> n:/opt/mpich/p4-gnu/sbin,PBS_O_MAIL=/var/spool/mail/joeb
> PBS_O_SHELL=/bin/bash,PBS_O_HOST= excalibur.example.com ,
> PBS_O_WORKDIR=/home/joeb,PBS_O_QUEUE=default
> comment = Job started on Wed Dec 06 at 08:54
> etime = Wed Dec 6 08:54:16 2006
> ---------------------------------------------------------------------------------
>
>
> However, the script output looks like this:
>
> Job ID: 76.excalibur.example.com
> Working directory is /home/joeb
> Running on host n04.example.com
> Time is Wed Dec 6 08:54:17 PST 2006
> Directory is /home/joeb
> The node file is /net/fs/home/joeb/local_nodefile
> This job runs on the following processors:
> n09.example.com:4 n10.example.com:4 n11.example.com:4 n12.example.com:4
> This job has allocated 4 nodes/processors.
>
> /usr/local/bin/mpich/x86_64/p4/gnu/bin/mpirun -nolocal -np 4 -machinefile
> /net/fs/home/joeb/local_nodefile /usr/local/bin/mpich/p
> 4-gnu/examples/cpi
>
> pi is approximately 3.1416009869231249, Error is 0.0000083333333318
> wall clock time = 0.003906
> ---------------------------------------------------------------------------------
>
>
> Can anyone explain why the output of qstat -f and the script echo
> statements differ, and how can I determine which is correct? (Short of
> sleeping for a while while I look for all the processes?)
>
> Thanks,
> dave
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20061206/61fe5975/attachment.html
More information about the torqueusers
mailing list