[torqueusers] problem with: set queuecfqroute_destinations=cfq@other.host

Stewart.Samuels at sanofi-aventis.com Stewart.Samuels at sanofi-aventis.com
Fri Aug 12 07:11:51 MDT 2005


Good point.


-----Original Message-----
From: torqueusers-bounces at supercluster.org
[mailto:torqueusers-bounces at supercluster.org]On Behalf Of garrick
Sent: Thursday, August 11, 2005 10:10 PM
To: torqueusers at supercluster.org
Subject: Re: [torqueusers] problem with: set
queuecfqroute_destinations=cfq at other.host

On Thu, Aug 11, 2005 at 08:03:16PM -0400, Daniel Widyono alleged:
> > Would you mind potentially shedding some light in this area?  Based on
> > the bits of code I've found, Garrick's assessment as to mom publishing
> > its status seems correct.  Prior to OpenPBS (not sure) and/or torque,
> > pbs_server use to probe mom for a status.  Now it seems, mom regularly
> > provides the status.  If it is simply an issue of providing the status,
> > why not just have mom broadcast the status to all pbs_servers to which
> > it can connect (default and defined in her $PBS_HOME/config file)?
> Might it be a good idea to simply broadcast its status, period, similar to
> ganglia's gmond?  Each pbs_server knows which nodes it cares about; it can
> filter status packets accordingly.  I have a suspicion that this might scale
> better (multicasting even better, but that's a whole other ballgame that I
> have _no_ experience in).

Thousands of MOMs all multicasting their status would be aweful.  Multicasting
is useful with 1 source and lots of "listeners".  TORQUE is the exact opposite:
lots of sources and few listeners.

(I haven't had OpenPBS running for a long time, so this is all from fuzzy
memory... someone correct me if I'm wrong)
The previous support was very incomplete.  MOM really only supported 1 server
connection at a time.  The last server to connect would be the "one true
server" and would get "state updates" (not the same as "status updates").  But
the MOMs accepted requests from all $clienthosts so things mostly kind of
worked anyways. 

Without node polling, MOM needs to keep track of multiple server connections
and send updates to all servers that have "checked in" (and listed in
$clienthost).  This will result in far better support than previously.

Garrick Staples, Linux/HPCC Administrator
University of Southern California

More information about the torqueusers mailing list