[torqueusers] pbs_mom binding to a given IP?

Juan Gallego juan+torque at physics.mcgill.ca
Sun Dec 4 12:12:50 MST 2005


greetings all,

i'm deploying torque (v2.0p2) on our cluster and i have a problem stemming
from our awkward network configuration: each node has 2 CPUs, and 2 NICs,
with each NIC connected to a separate network, so each CPU is
`virtually bound' to one of the NICs. to the user, it's looks like
one cpu, one `virtual node': node0-0, node0-1 (each of the CPUs on node0),
node1-0, node1-1 (on node1), etc.

the pbs_server node file contains the following:

node0-0 np=1
node0-1 np=1
node1-0 np=1
node1-1 np=1
node2-0 np=1
node2-1 np=1
...

the pbs_mom config file contains 2 $pbsserver entries, one for each interface 
on the server (this was necessary to get pbs_server to see the status 
from both the *-0 and *-1 nodes).

the problem is that pbs_mom seems to get confused when it gets what it
considers duplicate requests from its peers (it's really getting one
from say node1-0 directed to `node0-0' and another directed to `node0-1').

this could be avoided by running 2 pbs_mom per node (or one per `virtual
node' if you will), but now it can't be done because pbs_mom binds to the
INADDR_ANY address, so the second pbs_mom's bind  fails. if there was an
option to bind a pbs_mom to a particular IP (one to the -0 and the other to
-1), then our weird setup would work.

thoughts? suggestions? options? alternatives?

tia,
-- 
juan


More information about the torqueusers mailing list