[torqueusers] jobs completing with processes still running - SOLVED

Brock Palen brockp at umich.edu
Fri May 9 07:06:43 MDT 2008


How do other tm launchers do it?

if OMPI is different than all others, I could make a request to OMPI  
devel to change it to match others.

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp at umich.edu
(734)936-1985



On May 8, 2008, at 11:08 PM, Chris Samuel wrote:
>
> ----- "Lloyd Brown" <lloyd_brown at byu.edu> wrote:
>
>> If you do decide to use OpenMPI, though, be sure that the version you
>> install has the TM interface enabled.
>
> Also be aware that you don't want to use the current
> cpusets implementation in Torque 2.3.0 with OpenMPI
> as it stands.
>
> This is because the current implementation assumes
> that there will a tm_spawn() for each MPI child on
> each node and can put each into a single CPU cpuset
> based on the vnode.
>
> Unfortunately OpenMPI (and probably LAM too) makes
> only one tm_spawn per *node* it will be using to
> start the daemon (orted) and then that spawns the
> MPI processes on the node.
>
> So with the current implementation you end up with
> all the MPI processes on a node sharing a single CPU,
> which is a Bad Thing(tm).
>
> cheers!
> Chris
> -- 
> Christopher Samuel - (03) 9925 4751 - Systems Manager
>  The Victorian Partnership for Advanced Computing
>  P.O. Box 201, Carlton South, VIC 3053, Australia
> VPAC is a not-for-profit Registered Research Agency
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>



More information about the torqueusers mailing list