[torqueusers] configuring machine as both server and compute node - interface name confusion

David Beer dbeer at adaptivecomputing.com
Wed Mar 19 16:38:25 MDT 2014


Lev,

The mom sends in its hostname that is returned to it through the system
call gethostname.

A simple workaround for this issue is to add the -A switch when pbs_mom is
started:

pbs_mom -A <node name in nodes file>


On Wed, Mar 19, 2014 at 3:15 PM, Lev Givon <lev at columbia.edu> wrote:

> I'm trying to configure a system running Ubuntu 13.10 (x86_64) and torque
> 4.5.0pre1 (manually compiled and installed) to serve both as a torque
> server and
> a compute node. This machine has both a public and internal network
> interface;
> the latter is connected to a private network (192.168.0.0/8) that
> communicates
> with other Ubuntu 13.10 systems (which each have a single interface
> attached to
> the private network) that will eventually be added to the torque
> configuration
> as compute nodes. I've configured the system to set the hostname
> associated with
> its internal interface (node01.local) using avahi (zeroconf); I've
> verified that
> I can use this hostname to access the system on the internal network. I
> used
> this hostname in the pbs_server and pbs_mom configurations (i.e.,
> /var/spool/torque/torque.cfg, /var/spool/torque/mom_priv/config,
> /var/spool/torque/server_priv/nodes, and
> /var/spool/torque/server_priv/serverdb); when I start all of the torque
> daemons
> (pbs_server, pbs_sched, pbs_mom, and trqauthd), however, it seems that
> pbs_server tries to use the name associated with the external interface
> (master)
> despite what is specified in the config files (excerpt from the server
> logs):
>
> 03/19/2014 14:50:31;0006;PBS_Server.1913;Svr;PBS_Server;Using ports
> Server:15001
> Scheduler:15004  MOM:15002 (server: 'master.ee.columbia.edu')
> ..
> 03/19/2014
> 14:51:01;0001;PBS_Server.1920;Svr;PBS_Server;LOG_ERROR::get_node_from_str,
> Node
> node01.local is reporting on node master, which pbs_server doesn't know
> about
>
> Any ideas as to why the name associated with the external interface is
> being
> used even though it is not specified anywhere in the torque configuration?
> Resolving the node01.local name via gethostbyname() returns the address of
> the
> internal interface because nsswitch.conf is configured to look at mdns when
> resolving names.
> --
> Lev Givon
> Bionet Group
> http://www.columbia.edu/~lev/
> http://lebedov.github.io/
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>



-- 
David Beer | Senior Software Engineer
Adaptive Computing
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20140319/129ef2ee/attachment-0001.html 


More information about the torqueusers mailing list