[torqueusers] IMPORTANT RFC: Changing how the 2.3 cpuset
code handles MPI jobs
Garrick Staples
garrick at usc.edu
Tue Jun 10 18:45:39 MDT 2008
On Wed, Jun 11, 2008 at 10:34:03AM +1000, Chris Samuel alleged:
> **************************************************************************
> ** Hi all, please can you read this and respond if it would affect you! **
> **************************************************************************
>
> I've commented in the past about how the fact that the
> current 2.3 cpusets code puts processes created via tm_spawn
> into a per-vnode (single CPU) cpuset causes problems for
> us with OpenMPI here:
>
> http://www.clusterresources.com/pipermail/torqueusers/2008-April/007127.html
>
> and Dave Singleton from APAC pointed out that this will
> break a number of other MPI implementations:
>
> http://www.clusterresources.com/pipermail/torqueusers/2008-May/007312.html
>
> So I would like to suggest two changes for Torque:
>
> 1) Immediately - change pbs_mom to put tasks created
> by tm_spawn into the *job* cpuset and not the per-vnode
> ones. It's a simple patch and we're using it successfully
> here at VPAC to unbreak OpenMPI.
Makes sense to me.
> 2) Longer term - create a compile/run time configuration
> option to re-enable them for those sites that will never
> run code that will break with the per-vnode cpusets.
Should be a run-time behaviour. Let the job set a env variable or something.
--
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California
Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20080610/c4597421/attachment.bin
More information about the torqueusers
mailing list