[torqueusers] pbs_mom binding to a given IP?
garrick at usc.edu
Mon Dec 5 14:26:33 MST 2005
On Mon, Dec 05, 2005 at 03:50:53PM -0500, Juan Gallego alleged:
> On 2005-12-05 11:33-0800, Garrick Staples <garrick at usc.edu> wrote:
> | I don't think the bind address is going to be your only problem with 2
> | MOMs on 1 node. You'll probably need 2 different PBS_SERVER_HOME
> | directories too.
> yup, but that's a trivial change (i tried, and it work til the second
> pbs_mom failed to bind).
> | Why have this awkward setup? TORQUE handles SMP nodes just fine. Is
> | there something lacking in the SMP support? I get the feeling you are
> | solving a problem that should be solved with the scheduler, or maybe
> | full OS virtualization.
> SMP is not the problem, it's that now all MPI communications are done
> through one of the network interfaces (because the node-list that MPI gets
> from torque only has one IP for each node), thus the bandwidth is, on
> average, half what it could be. because there's no guarantee we'll get both
> CPUs on each assigned node, we can't just `double up'. also, bonding both
> interfaces is not an option.
> if i don't make sense, just ignore :)
So what you really need is just a modified $PBS_NODEFILE?
It might be easier to just patch mpirun, or use mpiexec's hostname
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20051205/ca5dbffc/attachment.bin
More information about the torqueusers