[torqueusers] IMPORTANT RFC: Changing how the 2.3 cpuset code
handles MPI jobs
Joshua Bernstein
jbernstein at penguincomputing.com
Wed Jun 11 14:32:12 MDT 2008
Chris Samuel wrote:
> **************************************************************************
> ** Hi all, please can you read this and respond if it would affect you! **
> **************************************************************************
>
> I've commented in the past about how the fact that the
> current 2.3 cpusets code puts processes created via tm_spawn
> into a per-vnode (single CPU) cpuset causes problems for
> us with OpenMPI here:
>
> http://www.clusterresources.com/pipermail/torqueusers/2008-April/007127.html
>
> and Dave Singleton from APAC pointed out that this will
> break a number of other MPI implementations:
>
> http://www.clusterresources.com/pipermail/torqueusers/2008-May/007312.html
>
> So I would like to suggest two changes for Torque:
>
> 1) Immediately - change pbs_mom to put tasks created
> by tm_spawn into the *job* cpuset and not the per-vnode
> ones. It's a simple patch and we're using it successfully
> here at VPAC to unbreak OpenMPI.
I'd agree that this makes sense as well.
-Joshua Bernstein
Software Engineer
Penguin Computing
More information about the torqueusers
mailing list