[torqueusers] IMPORTANT RFC: Changing how the 2.3 cpuset code handles MPI jobs

Chris Samuel csamuel at vpac.org
Tue Jun 10 18:34:03 MDT 2008


**************************************************************************
** Hi all, please can you read this and respond if it would affect you! **
**************************************************************************

I've commented in the past about how the fact that the
current 2.3 cpusets code puts processes created via tm_spawn
into a per-vnode (single CPU) cpuset causes problems for
us with OpenMPI here:

http://www.clusterresources.com/pipermail/torqueusers/2008-April/007127.html

and Dave Singleton from APAC pointed out that this will
break a number of other MPI implementations:

http://www.clusterresources.com/pipermail/torqueusers/2008-May/007312.html

So I would like to suggest two changes for Torque:

1) Immediately - change pbs_mom to put tasks created
by tm_spawn into the *job* cpuset and not the per-vnode
ones.   It's a simple patch and we're using it successfully
here at VPAC to unbreak OpenMPI.

2) Longer term - create a compile/run time configuration
option to re-enable them for those sites that will never
run code that will break with the per-vnode cpusets.

** Can anyone see any reason why not to do this ? **

Option (1) shouldn't break anything, and will unbreak
sites using any of the affected MPI implementations.

Thanks,
Chris
-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency


More information about the torqueusers mailing list