[torquedev] torque server: setting the server name

Martin Siegert siegert at sfu.ca
Fri Jun 1 17:20:04 MDT 2012


Hi,

moving this to the dev list ...

On Tue, May 29, 2012 at 01:36:15PM -0700, Martin Siegert wrote:
> Hi David,
> 
> I will definitely add --with-tcp-retry-limit=5 to my configure options,
> since we did run into exactly that situation. However, the current
> situation is due to an ip mismatch between private and public ip address
> of the torque server: svr_connect.c, line 172
> 
>   if ((hostaddr == pbs_server_addr) && (port == pbs_server_port_dis))
>     {
>     return(PBS_LOCAL_CONNECTION); /* special value for local */
>     }
> 
> In our case: hostaddr = 172.18.1.0 and pbs_server_addr = 206.12.24.2.
> The former ip address is the (correct) ip address on the internal
> cluster network, the latter ip address is the public ip address and
> should not be used by torque anywhere.
> 
> We have in /etc/hosts
> 
> 172.18.1.0 b0
> 
> and then set the server name in 4 (!!) different places:
> 1) in qmgr we have
> set server server_name = b0
> 2) /var/spool/torque/server_name contains b0
> 3) /var/spool/torque/torque.cfg contains
> SERVERHOST b0
> 4) we configure with
> --with-default-server=b0
> 
> I always thought that it should be sufficient to set this once.
> Obviously I am wrong ... I am missing at least a fifth spot where
> I need to set this: how do I get torque server to set pbs_server_addr
> in svr_connect to 172.18.1.0?
> 
> For now we used the following workaround:
> 1) in /etc/hosts set
> 
> 172.18.1.0 hostname.domain.ca hostname b0
> 
> 2) restart torque server and wait a few seconds until qstat, etc.
> responds.
> 
> 3) change /etc/hosts back to
> 172.18.1.0 b0
> 
> This does "solve" the problem for now.
> I am still looking for a more permanent solution.

I did miss a fifth (and actually 6th) way of setting the server name:

5) start the server with the -H b0 commandline option.

As it turns out this is the only way. Methods 1-4 have no effect.

At this point I am wondering why we need 5 ways of setting the server
name. As a first step can somebody tell me what each of the 5 settings
accomplish?

This is my take:

1) in qmgr:

set server server_name = b0

As far as I can tell this has no effect. Can this be eliminated?

2) /var/spool/torque/server_name

This is essential: used by the clients (qsub, qstat, etc.) and also by
the mom (if no $pbsserver is specified in mom_priv/config). Not used
by the torque server.

3) torque.cfg
SERVERHOST b0

Read by qsub only. The man page says:
SERVERHOST specifies the value for the PBS_SERVER environment variable

I find this confusing: why would you want to set that environment variable
to something different than what is read from the server_name file?
In other words: what is the use case for having SERVERHOST set to something
different than what is in the server_name file?

Is it safe to say that this is not needed when the server_name file is in
place?

4) configure option --with-default-server=b0
Does this have any effect?

5) pbs_server -H b0 commandline option
essential. Determines the ip address to be used for the server.
If not used, gethostname is used to determine the ipaddress.

6) $pbsserver setting in mom_priv/config
Used by the mom for connecting to server; not needed when
server_name file is in place.

Is my assessment correct that only (2) and (5) are really needed?
Furthermore, (1) and (4) and possibly (3) do not serve any purpose?

Cheers,
Martin


More information about the torquedev mailing list