[torqueusers] Newbie torque script questions

dave first linux4dave at gmail.com
Wed Dec 6 10:15:31 MST 2006

I am such a newbie that I squeek.  I hope this is the correct forum in which
to ask this question.

I want to specify a nodelist other than that which would be $PBS_NODEFILE.
I want to specify n10, n11, n12 and n13, each with 4 processors.  The node
list looks something like this:


And it is called local_nodelist in the working directory.

The script sets PBS_NODEFILE=`pwd`/local_nodelist

qstat -f while running the script elicits what seems to be an erroneous

Job Id: 76.excalibur
    Job_Name = pbs_mpich.
    Job_Owner = joeb at excalibur.example.com
    resources_used.cput = 00:00:00
    resources_used.mem = 4296kb
    resources_used.vmem = 175988kb
    resources_used.walltime = 00:00:12
    job_state = R
    queue = default
    server = excalibur.example.com
    Checkpoint = u
    ctime = Wed Dec  6 08:54:16 2006
    Error_Path = excalibur.example.com:/home/joeb/pbs_mpich..e76
    exec_host = n04/0
    Hold_Types = n
    Join_Path = n
    Keep_Files = n
    Mail_Points = a
    mtime = Wed Dec  6 08:54:17 2006
    Output_Path = excalibur.example.com:/home/joeb/pbs_mpich..o76
    Priority = 0
    qtime = Wed Dec  6 08:54:16 2006
    Rerunable = True
    Resource_List.nodect = 1
    Resource_List.nodes = 1
    session_id = 31725
    Variable_List = PBS_O_HOME=/home/joeb,PBS_O_LANG=en_US.UTF-8,




    comment = Job started on Wed Dec 06 at 08:54
    etime = Wed Dec  6 08:54:16 2006

However, the script output looks like this:

Job ID: 76.excalibur.example.com
Working directory is /home/joeb
Running on host n04.example.com
Time is Wed Dec 6 08:54:17 PST 2006
Directory is /home/joeb
The node file is /net/fs/home/joeb/local_nodefile
This job runs on the following processors:
n09.example.com:4 n10.example.com:4 n11.example.com:4 n12.example.com:4
This job has allocated 4 nodes/processors.

/usr/local/bin/mpich/x86_64/p4/gnu/bin/mpirun -nolocal -np 4 -machinefile
/net/fs/home/joeb/local_nodefile /usr/local/bin/mpich/p

pi is approximately 3.1416009869231249, Error is 0.0000083333333318
wall clock time = 0.003906

Can anyone explain why the output of qstat -f and the script echo statements
differ, and how can I determine which is correct?  (Short of sleeping for a
while while I look for all the processes?)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20061206/650b6d95/attachment.html

More information about the torqueusers mailing list