[torqueusers] tm enabled mpirun for mpt?

Brock Palen brockp at umich.edu
Mon Sep 10 08:52:06 MDT 2007


Yeah we have seen the crazy VM numbers for MPT, Thats nuts, its not  
being picked up by the mom on our system, ill have to go run some  
tests again maybe it was a fluke,  Thanks for the input.

Brock Palen
Center for Advanced Computing
brockp at umich.edu
(734)936-1985


On Sep 10, 2007, at 10:39 AM, David Singleton wrote:

>
> It's not really possible to replace the MPT mpirun with a tm-enabled
> one.  There is a startup protocol between mpirun and the shepherd
> MPI task that would be hard to replicate (you'll notice that there
> are actually N+1 processes for an N process job - the extra one is
> the shepherd).
>
> I think if you only have one host (and users do not specify the array
> and host on the mpirun command line) then all the MPI processes are
> children of mpirun and hence known to the MOM.  So you should be
> seeing all the job memory use.  Unfortunately the memory use of MPT
> processes is a bit strange since they map each others memory to do
> zero-copy message passing.  The virtual memory blows out with every
> process having a VMA of 2-4GB for every other process (we currently
> have a 200 cpu job where every process is supposedly 660GB).  RSS is
> also an overestimate since any page involved in a message pass will be
> counted in the physical memory of at least two processes.  So I would
> imagine torque is actually overcounting job memory use, not
> undercounting.
>
> You could use MPI_MEMMAP_OFF to turn off memory mapping.  Or you
> could just run OpenMPI or LAM using shared memory segments although
> you may lose some of the NUMA-awareness of MPT.
>
> Cheers,
> David
>
>
> Brock Palen wrote:
>> We have torque+maui on a SGI altix,
>> We are enforcing memory use with maui and kill off jobs that go  
>> over the memory they requested.  Problem is sgi's mpirun that came  
>> with MPT does not use tm therefor all the processes are outside  
>> the control of torque/maui resulting in users sucking up memory  
>> and over running users!
>> Note we dont have cpu sets working its propack 5.
>> Any insight from torque users would be wonderful.
>> Brock Palen
>> Center for Advanced Computing
>> brockp at umich.edu
>> (734)936-1985
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
>



More information about the torqueusers mailing list