[torqueusers] jobs completing with processes still running - SOLVED

David Singleton David.Singleton at anu.edu.au
Fri May 9 07:15:02 MDT 2008


I think you'll find nearly all MPI's do what OpenMPI does.  SGI MPT,
LAM, Intel MPI, Quadrics MPI, I think even the latest MPICH ....
As I suggested earlier, I dont think subcpusets are the MOM's
domain.  Unfortunately, they are the application's (i.e. mpirun's)
responsibility.  In general, the MOM cannot know what size
subcpusets to build.

David

Brock Palen wrote:
> How do other tm launchers do it?
> 
> if OMPI is different than all others, I could make a request to OMPI 
> devel to change it to match others.
> 
> Brock Palen
> www.umich.edu/~brockp
> Center for Advanced Computing
> brockp at umich.edu
> (734)936-1985
> 
> 
> 
> On May 8, 2008, at 11:08 PM, Chris Samuel wrote:
>>
>> ----- "Lloyd Brown" <lloyd_brown at byu.edu> wrote:
>>
>>> If you do decide to use OpenMPI, though, be sure that the version you
>>> install has the TM interface enabled.
>>
>> Also be aware that you don't want to use the current
>> cpusets implementation in Torque 2.3.0 with OpenMPI
>> as it stands.
>>
>> This is because the current implementation assumes
>> that there will a tm_spawn() for each MPI child on
>> each node and can put each into a single CPU cpuset
>> based on the vnode.
>>
>> Unfortunately OpenMPI (and probably LAM too) makes
>> only one tm_spawn per *node* it will be using to
>> start the daemon (orted) and then that spawns the
>> MPI processes on the node.
>>
>> So with the current implementation you end up with
>> all the MPI processes on a node sharing a single CPU,
>> which is a Bad Thing(tm).
>>
>> cheers!
>> Chris
>> --Christopher Samuel - (03) 9925 4751 - Systems Manager
>>  The Victorian Partnership for Advanced Computing
>>  P.O. Box 201, Carlton South, VIC 3053, Australia
>> VPAC is a not-for-profit Registered Research Agency
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>>
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list