[torquedev] [patch] bind to ip on multihomed pbs_servers

Henning Glawe eartoaster at gmx.net
Fri Feb 8 01:26:51 MST 2008

On Thu, Feb 07, 2008 at 11:22:28PM -0800, Garrick Staples wrote:
> > pbs_server does not bind correctly to its assigned hostname/IP (with a
> > hostname on the command line like in
> > '/usr/sbin/pbs_server -a T -h torque.cluster').
> (pst, this is now -H in trunk).

are the outgoing connections bound to this ip in trunk?

> > This is true both for incoming connections:
> > 
> > root at n030:~> lsof -p `pidof pbs_server`
> > pbs_serve 1818 root    6u  IPv4 1550253             TCP *:15001 (LISTEN)
> > pbs_serve 1818 root    7u  IPv4 1550254             UDP *:15001
> > pbs_serve 1818 root    8u  IPv4 1550255             UDP *:1023
> We wouldn't want to bind for client connections.  They can come from any interface.

ok, this binding is not strictly necessairy.

> > and, even worse, for the outgoing ones, i.e. the source ip address of
> > outgoing ip packets seems not to be correctly set to the one extracted from
> > the -h option. The pbs_moms don't like to talk to the server if it uses the
> > wrong source ip.
> Have you added ore $pbsserver directives to pbs_mom's config?  pbs_mom can
> accept server connections from many IPs.

I know this option, as I used it to work around the missing-bind problem
already in another context:
- pbs_server with two ips on two different subnets attached to the same
- clients on both subnets
- moms need to have pbsserver set depending on which subnet they are in, as
  the missing bind lets the client see the server with a different name

> > As it is common and useful in such cases, I use an IP alias, i.e. I assign a
> > second ip to the server's cluster-communication-interface (both ips on the
> > same subnet, so there is only a single route pointing to the interface):
> Have you seen the new HA support in trunk?  Multiple pbs_server processes on
> different hosts will use different IPs and pbs_mom won't care.

Yes, I saw it. I eventually want to do a setup like this in the future. But
the problem my patch solves is kind of othogonal to the HA support. Even
then, if you use IP aliases (or secondary ips in current linux terminology),
the source ip is not reliable that of the -(h|H) option, which is IMHO a bug
in torque (as this option looks like it should solve the problem of
multi-homed torque servers).

c u
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20080208/608afacc/attachment-0001.bin

More information about the torquedev mailing list