[torqueusers] pbs_mom binding to a given IP?

Garrick Staples garrick at usc.edu
Mon Dec 5 14:26:33 MST 2005


On Mon, Dec 05, 2005 at 03:50:53PM -0500, Juan Gallego alleged:
> On 2005-12-05 11:33-0800, Garrick Staples <garrick at usc.edu> wrote:
> 
> | I don't think the bind address is going to be your only problem with 2
> | MOMs on 1 node.  You'll probably need 2 different PBS_SERVER_HOME
> | directories too.
> 
> yup, but that's a trivial change (i tried, and it work til the second 
> pbs_mom failed to bind).
> 
> | Why have this awkward setup?  TORQUE handles SMP nodes just fine.  Is
> | there something lacking in the SMP support?  I get the feeling you are
> | solving a problem that should be solved with the scheduler, or maybe
> | full OS virtualization.
> 
> SMP is not the problem, it's that now all MPI communications are done 
> through one of the network interfaces (because the node-list that MPI gets 
> from torque only has one IP for each node), thus the bandwidth is, on 
> average, half what it could be. because there's no guarantee we'll get both 
> CPUs on each assigned node, we can't just `double up'. also, bonding both 
> interfaces is not an option.
> 
> if i don't make sense, just ignore :)

So what you really need is just a modified $PBS_NODEFILE?

It might be easier to just patch mpirun, or use mpiexec's hostname
filter feature.

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20051205/ca5dbffc/attachment.bin


More information about the torqueusers mailing list