[torqueusers] IMPORTANT RFC: Changing how the 2.3 cpuset code handles MPI jobs

Joshua Bernstein jbernstein at penguincomputing.com
Wed Jun 11 14:32:12 MDT 2008



Chris Samuel wrote:
> **************************************************************************
> ** Hi all, please can you read this and respond if it would affect you! **
> **************************************************************************
> 
> I've commented in the past about how the fact that the
> current 2.3 cpusets code puts processes created via tm_spawn
> into a per-vnode (single CPU) cpuset causes problems for
> us with OpenMPI here:
> 
> http://www.clusterresources.com/pipermail/torqueusers/2008-April/007127.html
> 
> and Dave Singleton from APAC pointed out that this will
> break a number of other MPI implementations:
> 
> http://www.clusterresources.com/pipermail/torqueusers/2008-May/007312.html
> 
> So I would like to suggest two changes for Torque:
> 
> 1) Immediately - change pbs_mom to put tasks created
> by tm_spawn into the *job* cpuset and not the per-vnode
> ones.   It's a simple patch and we're using it successfully
> here at VPAC to unbreak OpenMPI.

I'd agree that this makes sense as well.

-Joshua Bernstein
Software Engineer
Penguin Computing


More information about the torqueusers mailing list