[torqueusers] pbs_mom binding to a given IP?

Garrick Staples garrick at usc.edu
Mon Dec 5 12:33:05 MST 2005


On Sun, Dec 04, 2005 at 02:12:50PM -0500, Juan Gallego alleged:
> 
> greetings all,
> 
> i'm deploying torque (v2.0p2) on our cluster and i have a problem stemming
> from our awkward network configuration: each node has 2 CPUs, and 2 NICs,
> with each NIC connected to a separate network, so each CPU is
> `virtually bound' to one of the NICs. to the user, it's looks like
> one cpu, one `virtual node': node0-0, node0-1 (each of the CPUs on node0),
> node1-0, node1-1 (on node1), etc.
> 
> the pbs_server node file contains the following:
> 
> node0-0 np=1
> node0-1 np=1
> node1-0 np=1
> node1-1 np=1
> node2-0 np=1
> node2-1 np=1
> ...
> 
> the pbs_mom config file contains 2 $pbsserver entries, one for each interface 
> on the server (this was necessary to get pbs_server to see the status 
> from both the *-0 and *-1 nodes).
> 
> the problem is that pbs_mom seems to get confused when it gets what it
> considers duplicate requests from its peers (it's really getting one
> from say node1-0 directed to `node0-0' and another directed to `node0-1').
> 
> this could be avoided by running 2 pbs_mom per node (or one per `virtual
> node' if you will), but now it can't be done because pbs_mom binds to the
> INADDR_ANY address, so the second pbs_mom's bind  fails. if there was an
> option to bind a pbs_mom to a particular IP (one to the -0 and the other to
> -1), then our weird setup would work.
> 
> thoughts? suggestions? options? alternatives?

I don't think the bind address is going to be your only problem with 2
MOMs on 1 node.  You'll probably need 2 different PBS_SERVER_HOME
directories too.

Why have this awkward setup?  TORQUE handles SMP nodes just fine.  Is
there something lacking in the SMP support?  I get the feeling you are
solving a problem that should be solved with the scheduler, or maybe
full OS virtualization.

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20051205/02ec0633/attachment.bin


More information about the torqueusers mailing list