[torqueusers] problem with: set queue cfqroute_destinations=cfq@other.host

garrick garrick at usc.edu
Thu Aug 11 12:51:01 MDT 2005


On Thu, Aug 11, 2005 at 01:34:00PM -0400, Stewart Samuels alleged:
> Garrick,
> 
> Are you saying that mom should support multiple pbs_servers? 

"should" in the sense that I think it is a good idea, yes.

> Originally, PBS used to.  But somewhere between the original PBS code
> and TORQUE (as up to torque-1.2.0p1 anyway), mom ONLY supports 1
> pbs_server.  I have found code in mom that supports my statement.  In
> fact, this is why only the first "$clienthost hostname" entry that mom
> can contact is that to which it connects.  All others are ignored.  If
> mom notices that a pbs_server is running on the same node that she is
> running on, mom connects to that server and ignores all entries in its
> $PBS_HOME/config "$clienthost hostname" list.
> 
> I suspect this change over was mad to allow torque to be much more
> scalable, but I am not certain of this and it certainly is an issue when
> you want to have dual master nodes in HA mode.

I think it got broken when MOM started sending "status" updates to pbs_server
so that maui didn't have to probe every node.  I think the multi-server support
has been degrading ever since.

I'm pretty sure re-adding this feature is high on CRI's wish list.  
("I don't work for CRI" heresay disclaimer here).

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20050811/f85ea1da/attachment.bin


More information about the torqueusers mailing list