[torquedev] [patch] bind to ip on multihomed pbs_servers
eartoaster at gmx.net
Fri Feb 8 01:26:51 MST 2008
On Thu, Feb 07, 2008 at 11:22:28PM -0800, Garrick Staples wrote:
> > pbs_server does not bind correctly to its assigned hostname/IP (with a
> > hostname on the command line like in
> > '/usr/sbin/pbs_server -a T -h torque.cluster').
> (pst, this is now -H in trunk).
are the outgoing connections bound to this ip in trunk?
> > This is true both for incoming connections:
> > root at n030:~> lsof -p `pidof pbs_server`
> > COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
> > pbs_serve 1818 root 6u IPv4 1550253 TCP *:15001 (LISTEN)
> > pbs_serve 1818 root 7u IPv4 1550254 UDP *:15001
> > pbs_serve 1818 root 8u IPv4 1550255 UDP *:1023
> We wouldn't want to bind for client connections. They can come from any interface.
ok, this binding is not strictly necessairy.
> > and, even worse, for the outgoing ones, i.e. the source ip address of
> > outgoing ip packets seems not to be correctly set to the one extracted from
> > the -h option. The pbs_moms don't like to talk to the server if it uses the
> > wrong source ip.
> Have you added ore $pbsserver directives to pbs_mom's config? pbs_mom can
> accept server connections from many IPs.
I know this option, as I used it to work around the missing-bind problem
already in another context:
- pbs_server with two ips on two different subnets attached to the same
- clients on both subnets
- moms need to have pbsserver set depending on which subnet they are in, as
the missing bind lets the client see the server with a different name
> > As it is common and useful in such cases, I use an IP alias, i.e. I assign a
> > second ip to the server's cluster-communication-interface (both ips on the
> > same subnet, so there is only a single route pointing to the interface):
> Have you seen the new HA support in trunk? Multiple pbs_server processes on
> different hosts will use different IPs and pbs_mom won't care.
Yes, I saw it. I eventually want to do a setup like this in the future. But
the problem my patch solves is kind of othogonal to the HA support. Even
then, if you use IP aliases (or secondary ips in current linux terminology),
the source ip is not reliable that of the -(h|H) option, which is IMHO a bug
in torque (as this option looks like it should solve the problem of
multi-homed torque servers).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: Digital signature
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20080208/608afacc/attachment-0001.bin
More information about the torquedev