[torquedev] [patch] bind to ip on multihomed pbs_servers
Toni L. Harbaugh-Blackford [Contr]
harbaugh at ncifcrf.gov
Fri Feb 8 01:40:21 MST 2008
On Fri, 8 Feb 2008, Henning Glawe wrote:
> On Fri, Feb 08, 2008 at 03:01:26AM -0500, Toni L. Harbaugh-Blackford [Contr] wrote:
> > I also have a patch, but mine is more invasive, modifying the svr_connect()
> > and client_to_svr() functions by adding the ip address to bind to as a passed
> > argument. Your patch is much simpler, so I hope it makes it in.
> well, my patch is more a proof-of-concept, as it is an unclean solution
> communicating the IP to the relevant functions by a global variable...
> ultimately, it should be done the way you did it. but this would change the
> api of libtorque, and i do not know how much software is out there which has
> to be modified in this case...
> could you submit your patch, too?
I would like to but I don't think I can do it right now. I have to clean it
up, and currently I am so swamped with work I can't get a chance to do it.
Also, I have not fully tested all the scenarios, which I hope to do with some
test systems soon. Currently I have the code in production, so I have to be
careful with changes.
The whole reason why I created my patch was that in the case of a server alias,
the job obituaries were not getting returned to the 'failover' server if the
original server failed. In fact, they don't get returned at all and pbs_mom
goes into a crazy state trying to return the obits; the whole PBS system
gets bogged down and unresponsive. If the HA changes fix this, I might want
to spend more time testing the new version of PBS on my test systems rather
than refining a patch for an old version.
> c u
> torquedev mailing list
> torquedev at supercluster.org
Toni Harbaugh-Blackford harbaugh at ncifcrf.gov
Advanced Biomedical Computing Center (ABCC)
National Cancer Institute
Contractor - SAIC/Frederick
More information about the torquedev