[torqueusers] problem with: set queue cfqroute_destinations=cfq@other.host

garrick garrick at usc.edu
Thu Aug 11 20:10:26 MDT 2005


On Thu, Aug 11, 2005 at 08:03:16PM -0400, Daniel Widyono alleged:
> > Would you mind potentially shedding some light in this area?  Based on
> > the bits of code I've found, Garrick's assessment as to mom publishing
> > its status seems correct.  Prior to OpenPBS (not sure) and/or torque,
> > pbs_server use to probe mom for a status.  Now it seems, mom regularly
> > provides the status.  If it is simply an issue of providing the status,
> > why not just have mom broadcast the status to all pbs_servers to which
> > it can connect (default and defined in her $PBS_HOME/config file)?
> 
> Might it be a good idea to simply broadcast its status, period, similar to
> ganglia's gmond?  Each pbs_server knows which nodes it cares about; it can
> filter status packets accordingly.  I have a suspicion that this might scale
> better (multicasting even better, but that's a whole other ballgame that I
> have _no_ experience in).

Thousands of MOMs all multicasting their status would be aweful.  Multicasting
is useful with 1 source and lots of "listeners".  TORQUE is the exact opposite:
lots of sources and few listeners.

(I haven't had OpenPBS running for a long time, so this is all from fuzzy
memory... someone correct me if I'm wrong)
The previous support was very incomplete.  MOM really only supported 1 server
connection at a time.  The last server to connect would be the "one true
server" and would get "state updates" (not the same as "status updates").  But
the MOMs accepted requests from all $clienthosts so things mostly kind of
worked anyways. 

Without node polling, MOM needs to keep track of multiple server connections
and send updates to all servers that have "checked in" (and listed in
$clienthost).  This will result in far better support than previously.

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20050811/ccdb4760/attachment.bin


More information about the torqueusers mailing list