[Mauiusers] maui limits? looking for experience

Gus Correa gus at ldeo.columbia.edu
Wed Sep 28 17:56:56 MDT 2011


Hi Arnau, Jason

Well, I guess I should consider myself happy
to administer only small clusters. :)

Now, how about the [terse] guidance in the Maui Admin Guide for large 
clusters?
http://www.adaptivecomputing.com/resources/docs/maui/a.ilargeclusters.php
And the [slightly more verbose] one for Torque:
http://www.adaptivecomputing.com/resources/docs/torque/a.flargeclusters.php

Would them help with scalability?

Cheers,
Gus Correa


Jason Williams wrote:
> I've noticed similar things when my cluster gets loaded too.  I find it 
> annoying that if maui gets behind, and "misses" scheduler iterations, 
> because it's working on high job turn around, it has to catch up on the 
> missed iterations.  Also, while maui is scheduling things, there is what 
> appears to be a type of global "lock" or block on all communications to 
> maui.  So if you get very busy, and start missing many iterations, it 
> can sometimes be over 30 minutes to over an hour before maui starts 
> responding again.  To users, this may look like a deadlock, but really, 
> when you look at the logs, maui is just going nuts trying to catch up.
> 
> I've been meaning to look at the code to figure out what the heck is 
> going on, but I haven't had time.
> 
> Basically, that's my long winded way of saying "I have seen this too, 
> Arnau."  And that I don't really have a good way around it aside from 
> setting limitations as another member suggested.
> 
> --
> Jason Williams
> Sr. Systems Administrator
> Homewood HPC Cluster
> Johns Hopkins University
> 
> On 9/28/2011 10:40 AM, Arnau Bria wrote:
>> Hi all,
>>
>> we've been using torque/maui for a long time. Our initial cluster was
>> about 50 nodes and now ~350 with 3k processors.
>>
>> It has been working fine since last cluster upgrade, when we added
>> last 500 processors. Since then, maui client commands hang and we had
>> to increase poll interval cause scheduling cycle took too much... Now,
>> with a system with 3k running jobs and 3k in queue, we're facing more
>> maui issues...
>>
>> So, we were wondering which are maui limits, if we have reached any of
>> them and if anyone who already reached our limits could share his
>> experience, on solving them, with us.
>>
>> we're running maui-3.3-1.x86_64.
>>
>>
>> Many thanks in advance,
>> Cheers,
>> Arnau
>> _______________________________________________
>> mauiusers mailing list
>> mauiusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/mauiusers
> 
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers



More information about the mauiusers mailing list