[torqueusers] jobs completing with processes still running -
David.Singleton at anu.edu.au
Fri May 9 07:15:02 MDT 2008
I think you'll find nearly all MPI's do what OpenMPI does. SGI MPT,
LAM, Intel MPI, Quadrics MPI, I think even the latest MPICH ....
As I suggested earlier, I dont think subcpusets are the MOM's
domain. Unfortunately, they are the application's (i.e. mpirun's)
responsibility. In general, the MOM cannot know what size
subcpusets to build.
Brock Palen wrote:
> How do other tm launchers do it?
> if OMPI is different than all others, I could make a request to OMPI
> devel to change it to match others.
> Brock Palen
> Center for Advanced Computing
> brockp at umich.edu
> On May 8, 2008, at 11:08 PM, Chris Samuel wrote:
>> ----- "Lloyd Brown" <lloyd_brown at byu.edu> wrote:
>>> If you do decide to use OpenMPI, though, be sure that the version you
>>> install has the TM interface enabled.
>> Also be aware that you don't want to use the current
>> cpusets implementation in Torque 2.3.0 with OpenMPI
>> as it stands.
>> This is because the current implementation assumes
>> that there will a tm_spawn() for each MPI child on
>> each node and can put each into a single CPU cpuset
>> based on the vnode.
>> Unfortunately OpenMPI (and probably LAM too) makes
>> only one tm_spawn per *node* it will be using to
>> start the daemon (orted) and then that spawns the
>> MPI processes on the node.
>> So with the current implementation you end up with
>> all the MPI processes on a node sharing a single CPU,
>> which is a Bad Thing(tm).
>> --Christopher Samuel - (03) 9925 4751 - Systems Manager
>> The Victorian Partnership for Advanced Computing
>> P.O. Box 201, Carlton South, VIC 3053, Australia
>> VPAC is a not-for-profit Registered Research Agency
>> torqueusers mailing list
>> torqueusers at supercluster.org
> torqueusers mailing list
> torqueusers at supercluster.org
More information about the torqueusers