[torqueusers] IMPORTANT RFC: Changing how the 2.3 cpuset code handles MPI jobs

Garrick Staples garrick at usc.edu
Tue Jun 10 18:45:39 MDT 2008


On Wed, Jun 11, 2008 at 10:34:03AM +1000, Chris Samuel alleged:
> **************************************************************************
> ** Hi all, please can you read this and respond if it would affect you! **
> **************************************************************************
> 
> I've commented in the past about how the fact that the
> current 2.3 cpusets code puts processes created via tm_spawn
> into a per-vnode (single CPU) cpuset causes problems for
> us with OpenMPI here:
> 
> http://www.clusterresources.com/pipermail/torqueusers/2008-April/007127.html
> 
> and Dave Singleton from APAC pointed out that this will
> break a number of other MPI implementations:
> 
> http://www.clusterresources.com/pipermail/torqueusers/2008-May/007312.html
> 
> So I would like to suggest two changes for Torque:
> 
> 1) Immediately - change pbs_mom to put tasks created
> by tm_spawn into the *job* cpuset and not the per-vnode
> ones.   It's a simple patch and we're using it successfully
> here at VPAC to unbreak OpenMPI.

Makes sense to me.

 
> 2) Longer term - create a compile/run time configuration
> option to re-enable them for those sites that will never
> run code that will break with the per-vnode cpusets.

Should be a run-time behaviour.  Let the job set a env variable or something.

-- 
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20080610/c4597421/attachment.bin


More information about the torqueusers mailing list