[Mauiusers] maui limits? looking for experience
jasonw at Jhu.edu
Wed Sep 28 17:42:59 MDT 2011
I've noticed similar things when my cluster gets loaded too. I find it
annoying that if maui gets behind, and "misses" scheduler iterations,
because it's working on high job turn around, it has to catch up on the
missed iterations. Also, while maui is scheduling things, there is what
appears to be a type of global "lock" or block on all communications to
maui. So if you get very busy, and start missing many iterations, it
can sometimes be over 30 minutes to over an hour before maui starts
responding again. To users, this may look like a deadlock, but really,
when you look at the logs, maui is just going nuts trying to catch up.
I've been meaning to look at the code to figure out what the heck is
going on, but I haven't had time.
Basically, that's my long winded way of saying "I have seen this too,
Arnau." And that I don't really have a good way around it aside from
setting limitations as another member suggested.
Sr. Systems Administrator
Homewood HPC Cluster
Johns Hopkins University
On 9/28/2011 10:40 AM, Arnau Bria wrote:
> Hi all,
> we've been using torque/maui for a long time. Our initial cluster was
> about 50 nodes and now ~350 with 3k processors.
> It has been working fine since last cluster upgrade, when we added
> last 500 processors. Since then, maui client commands hang and we had
> to increase poll interval cause scheduling cycle took too much... Now,
> with a system with 3k running jobs and 3k in queue, we're facing more
> maui issues...
> So, we were wondering which are maui limits, if we have reached any of
> them and if anyone who already reached our limits could share his
> experience, on solving them, with us.
> we're running maui-3.3-1.x86_64.
> Many thanks in advance,
> mauiusers mailing list
> mauiusers at supercluster.org
More information about the mauiusers