Lennart Karlsson Lennart.Karlsson at nsc.liu.se
Tue Sep 6 14:18:31 MDT 2005

Garrick Staples wrote Sep 06:
> On Tue, Sep 06, 2005 at 03:08:58PM +0200, Lennart Karlsson alleged:
> > Torque seems to name nodes in an inconsistent way when using
> > an internal IP network for the compute nodes. Here comes an
> > example:
> > 
> > The PBS server node has hostname "torn", an external IP number named
> > "torn" and an internal IP number named "n0".
> > 
> > A login node has has hostname "tornado", an external IP number named
> > "tornado" and an internal IP number named "l1".
> The hostnames start out inconsistent so TORQUE is going to have a hard
> time.

Thanks Garric for commenting on my improvement suggestion!

No, I do not see that the hostnames start out in an inconsistent way.
It is quite normal to let the hostnames follow the name of the external
IP number, with simple names or FQDNs.

I noticed that the PBS server recognized that the job owner communicated
over the internal IP network, thus using the "l1" name, and hoped that
also the other host name data would use this "l1" name to be consistent.
In that way I could use reasonable host files on the compute nodes,
that did not mention the external interface (IP number or its corresponding
name) of the submit hosts.
> > Communication between login node, PBS server node and computer nodes
> > are all the time running on the internal IP network and thus I appreciate
> > that the "Job_Owner" data actually mentions the internal host name "l1".
> > 
> > But otherwise it seems like all other job data are set to the external
> > name "tornado": Error_Path, Output_Path, and PBS_O_HOST. I also
> > have noted that the mom_superior (first node in job) tries to make
> > a "qsub sock" connection to the external IP interface of the login node.
> > 
> > It would be much better if all these host address references went to the
> > internal IP addresses, i.e. if the host address reference in the "Job_Owner"
> > data field was used also in those other places, because these host address
> > will be used on the compute nodes. (Trying to reach their external IP
> > addresses will probably fail, due to routing problems and/or firewalls.)
> > 
> > I would like this change to Torque, please.
> > 
> > Can this be made the default behavior, without wrecking havoc
> > with other, existing installations?
> Inside of pbs_server, there can only be one "server name".  Fortunately
> this is configurable with the SERVERHOST paramater in torque.cfg:
> http://www.clusterresources.com/products/torque/docs20/a.ktorquecfg.shtml#serverhost

Yes thanks, I had noticed that.
> > The second best alternative would be to configure into the pbs_server
> > configuration the preferred host names to use for different submit hosts.
> > In the pbs_server configuration file torque.cnf you may change the way
> > the PBS server host presents itself IP-wise, but (as of my understanding)
> > not the way other submit hosts present themselves.
> This is handled by the $PBSHOME/server_name file on the submitting
> hosts.

Are you really correct here? I believe that the $PBSHOME/server_name only
tells the PBS user programs (like qsub and qstat) on the job submit host where
to find the PBS server. It does not give the job submit host the option
to present itself to the PBS server with the name of its internal IP number.

I would like the job submit hosts, like the host "tornado" in the example 
to present themselves with the appropriate (internal) IP number (or its name).

My wishful thinking is that the PBS server could pick up this information
by itself, by making an intelligent guess out of seeing that the submit
host communicates on the internal network (as of now, it is right-out

More realistic is perhaps that I can configure my way of this.
It would be fine with some configuration file (or, worse but better than
nothing, using the qmgr interface) options  that tell about what interfaces
to use on all nodes. Or with a companion file to the $PBSHOME/server_name
file (perhaps $PBSHOME/my_name?) on every job submit host?

Best regards,
-- Lennart Karlsson <Lennart.Karlsson at nsc.liu.se>
   National Supercomputer Centre in Linkoping, Sweden

