[torqueusers] pbs_mom binding to a given IP?
juan+torque at physics.mcgill.ca
Sun Dec 4 12:12:50 MST 2005
i'm deploying torque (v2.0p2) on our cluster and i have a problem stemming
from our awkward network configuration: each node has 2 CPUs, and 2 NICs,
with each NIC connected to a separate network, so each CPU is
`virtually bound' to one of the NICs. to the user, it's looks like
one cpu, one `virtual node': node0-0, node0-1 (each of the CPUs on node0),
node1-0, node1-1 (on node1), etc.
the pbs_server node file contains the following:
the pbs_mom config file contains 2 $pbsserver entries, one for each interface
on the server (this was necessary to get pbs_server to see the status
from both the *-0 and *-1 nodes).
the problem is that pbs_mom seems to get confused when it gets what it
considers duplicate requests from its peers (it's really getting one
from say node1-0 directed to `node0-0' and another directed to `node0-1').
this could be avoided by running 2 pbs_mom per node (or one per `virtual
node' if you will), but now it can't be done because pbs_mom binds to the
INADDR_ANY address, so the second pbs_mom's bind fails. if there was an
option to bind a pbs_mom to a particular IP (one to the -0 and the other to
-1), then our weird setup would work.
thoughts? suggestions? options? alternatives?
More information about the torqueusers