[torquedev] torque server: setting the server name

Ken Nielson knielson at adaptivecomputing.com
Mon Jun 4 09:09:37 MDT 2012


Martin,

I have added your problem description to our internal ticket system so we
don't lose this information.

We definitely need a better way to handle server names, especially on
multi-homed systems.

Thanks

Ken

On Fri, Jun 1, 2012 at 5:20 PM, Martin Siegert <siegert at sfu.ca> wrote:

> Hi,
>
> moving this to the dev list ...
>
> On Tue, May 29, 2012 at 01:36:15PM -0700, Martin Siegert wrote:
> > Hi David,
> >
> > I will definitely add --with-tcp-retry-limit=5 to my configure options,
> > since we did run into exactly that situation. However, the current
> > situation is due to an ip mismatch between private and public ip address
> > of the torque server: svr_connect.c, line 172
> >
> >   if ((hostaddr == pbs_server_addr) && (port == pbs_server_port_dis))
> >     {
> >     return(PBS_LOCAL_CONNECTION); /* special value for local */
> >     }
> >
> > In our case: hostaddr = 172.18.1.0 and pbs_server_addr = 206.12.24.2.
> > The former ip address is the (correct) ip address on the internal
> > cluster network, the latter ip address is the public ip address and
> > should not be used by torque anywhere.
> >
> > We have in /etc/hosts
> >
> > 172.18.1.0 b0
> >
> > and then set the server name in 4 (!!) different places:
> > 1) in qmgr we have
> > set server server_name = b0
> > 2) /var/spool/torque/server_name contains b0
> > 3) /var/spool/torque/torque.cfg contains
> > SERVERHOST b0
> > 4) we configure with
> > --with-default-server=b0
> >
> > I always thought that it should be sufficient to set this once.
> > Obviously I am wrong ... I am missing at least a fifth spot where
> > I need to set this: how do I get torque server to set pbs_server_addr
> > in svr_connect to 172.18.1.0?
> >
> > For now we used the following workaround:
> > 1) in /etc/hosts set
> >
> > 172.18.1.0 hostname.domain.ca hostname b0
> >
> > 2) restart torque server and wait a few seconds until qstat, etc.
> > responds.
> >
> > 3) change /etc/hosts back to
> > 172.18.1.0 b0
> >
> > This does "solve" the problem for now.
> > I am still looking for a more permanent solution.
>
> I did miss a fifth (and actually 6th) way of setting the server name:
>
> 5) start the server with the -H b0 commandline option.
>
> As it turns out this is the only way. Methods 1-4 have no effect.
>
> At this point I am wondering why we need 5 ways of setting the server
> name. As a first step can somebody tell me what each of the 5 settings
> accomplish?
>
> This is my take:
>
> 1) in qmgr:
>
> set server server_name = b0
>
> As far as I can tell this has no effect. Can this be eliminated?
>
> 2) /var/spool/torque/server_name
>
> This is essential: used by the clients (qsub, qstat, etc.) and also by
> the mom (if no $pbsserver is specified in mom_priv/config). Not used
> by the torque server.
>
> 3) torque.cfg
> SERVERHOST b0
>
> Read by qsub only. The man page says:
> SERVERHOST specifies the value for the PBS_SERVER environment variable
>
> I find this confusing: why would you want to set that environment variable
> to something different than what is read from the server_name file?
> In other words: what is the use case for having SERVERHOST set to something
> different than what is in the server_name file?
>
> Is it safe to say that this is not needed when the server_name file is in
> place?
>
> 4) configure option --with-default-server=b0
> Does this have any effect?
>
> 5) pbs_server -H b0 commandline option
> essential. Determines the ip address to be used for the server.
> If not used, gethostname is used to determine the ipaddress.
>
> 6) $pbsserver setting in mom_priv/config
> Used by the mom for connecting to server; not needed when
> server_name file is in place.
>
> Is my assessment correct that only (2) and (5) are really needed?
> Furthermore, (1) and (4) and possibly (3) do not serve any purpose?
>
> Cheers,
> Martin
> _______________________________________________
> torquedev mailing list
> torquedev at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torquedev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torquedev/attachments/20120604/2e7405da/attachment.html 


More information about the torquedev mailing list