[torqueusers] problem with: set queue
Stewart.Samuels at sanofi-aventis.com
Thu Aug 11 12:21:19 MDT 2005
Would you mind potentially shedding some light in this area? Based on
the bits of code I've found, Garrick's assessment as to mom publishing
its status seems correct. Prior to OpenPBS (not sure) and/or torque,
pbs_server use to probe mom for a status. Now it seems, mom regularly
provides the status. If it is simply an issue of providing the status,
why not just have mom broadcast the status to all pbs_servers to which
it can connect (default and defined in her $PBS_HOME/config file)?
On Thu, 2005-08-11 at 14:51, garrick wrote:
> On Thu, Aug 11, 2005 at 01:34:00PM -0400, Stewart Samuels alleged:
> > Garrick,
> > Are you saying that mom should support multiple pbs_servers?
> "should" in the sense that I think it is a good idea, yes.
> > Originally, PBS used to. But somewhere between the original PBS code
> > and TORQUE (as up to torque-1.2.0p1 anyway), mom ONLY supports 1
> > pbs_server. I have found code in mom that supports my statement. In
> > fact, this is why only the first "$clienthost hostname" entry that mom
> > can contact is that to which it connects. All others are ignored. If
> > mom notices that a pbs_server is running on the same node that she is
> > running on, mom connects to that server and ignores all entries in its
> > $PBS_HOME/config "$clienthost hostname" list.
> > I suspect this change over was mad to allow torque to be much more
> > scalable, but I am not certain of this and it certainly is an issue when
> > you want to have dual master nodes in HA mode.
> I think it got broken when MOM started sending "status" updates to pbs_server
> so that maui didn't have to probe every node. I think the multi-server support
> has been degrading ever since.
> I'm pretty sure re-adding this feature is high on CRI's wish list.
> ("I don't work for CRI" heresay disclaimer here).
More information about the torqueusers