[torqueusers] PBS_NODEFILE incomplete (entries for last(?) nodeonly)

Gus Correa gus at ldeo.columbia.edu
Thu Oct 7 17:09:44 MDT 2010


Hi Gordon

Some guesses:

1) Do you have mom daemons running on the nodes?
I.e. on the nodes, what is the output of "service pbs status" or 
"service pbs_mom status"?

2) Do your mom daemons on the nodes point to the server?
I.e. what is the content of $TORQUE/mom_priv/config?
Is it consistent with the server name in $TORQUE/server_name ?

3) What is the content of your /etc/hosts file on the head node
and on each node?
Are they the same?
Are they consistent with your nodes file,
i.e. head_node:$TORQUE/server_priv/nodes (i.e. same host names
that have IP addresses listed in /etc/hosts)?

4) Are you really using the Internet to connect the nodes,
as the fqdn names on your nodes file (sent in an old email) suggest?
(I can't find it, maybe you can post it again.)
Or are you using a private subnet?

5) Did you try to run hostname via mpirun on all nodes?
I.e., something like this:

...
#PBS -l nodes=8:ppn=2
...
mpirun -np 16 hostname


I hope this helps,
Gus Correa

Gordon Wells wrote:
> I've tried that, unfortunately I never get a $PBS_NODEFILE that spans 
> more than one node.
> 
> -- max(∫(εὐδαιμονία)dt)
> 
> Dr Gordon Wells
> Bioinformatics and Computational Biology Unit
> Department of Biochemistry
> University of Pretoria
> 
> 
> On 7 October 2010 10:02, Vaibhav Pol <vaibhavp at cdac.in 
> <mailto:vaibhavp at cdac.in>> wrote:
> 
>      Hi ,
>      you must set server as well as queue attribute.
> 
>             set server resources_available.nodect = (number of  nodes *
>     cpus per node)
>             set <queue name> resources_available.nodect = (number of
>      nodes * cpus per node)
> 
> 
>      Thanks and regards,
>      Vaibhav Pol
>      National PARAM Supercomputing Facility
>      Centre for Development of Advanced Computing
>      Ganeshkhind Road
>      Pune University Campus
>      PUNE-Maharastra
>      Phone +91-20-25704176 ext: 176
>      Cell Phone :  +919850466409
> 
> 
> 
>     On Thu, 7 Oct 2010, Gordon Wells wrote:
> 
>         Hi
> 
>         I've now tried torque 2.5.2 as well, same problems.
>         Setting resources_available.nodect has no effect except allowing
>         me to use
>         "-l nodes=x" with x > 14
> 
>         regards
> 
>         -- max(∫(εὐδαιμονία)dt)
> 
>         Dr Gordon Wells
>         Bioinformatics and Computational Biology Unit
>         Department of Biochemistry
>         University of Pretoria
> 
> 
>         On 6 October 2010 20:04, Glen Beane <glen.beane at gmail.com
>         <mailto:glen.beane at gmail.com>> wrote:
> 
>             On Wed, Oct 6, 2010 at 1:12 PM, Gordon Wells
>             <gordon.wells at gmail.com <mailto:gordon.wells at gmail.com>>
>             wrote:
> 
>                 Can I confirm that this will definitely fix the problem?
>                 Unfortunately
> 
>             this
> 
>                 cluster also needs to be glite compatible, 2.3.6 seems
>                 to be the latest
> 
>             that
> 
>                 will work
> 
> 
> 
>             i'm not certain...  do you happen to have set server
>             resources_available.nodect set?  I have seen bugs with
>             PBS_NODEFILE
>             contents when this server attribute is set.  This may be a
>             manifestation of this bug, and I'm not sure if it has been
>             corrected.
> 
>             try unsetting this and submitting a job with -l nodes=X:ppn=Y
>             _______________________________________________
>             torqueusers mailing list
>             torqueusers at supercluster.org
>             <mailto:torqueusers at supercluster.org>
>             http://www.supercluster.org/mailman/listinfo/torqueusers
> 
> 
>         -- 
>         This message has been scanned for viruses and
>         dangerous content by MailScanner, and is
>         believed to be clean.
> 
> 
>     -- 
>     This message has been scanned for viruses and
>     dangerous content by MailScanner, and is
>     believed to be clean.
> 
> 
>     _______________________________________________
>     torqueusers mailing list
>     torqueusers at supercluster.org <mailto:torqueusers at supercluster.org>
>     http://www.supercluster.org/mailman/listinfo/torqueusers
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list