[Mauiusers] Help with Maui
Ronny T. Lampert
telecaadmin at gmail.com
Thu Jun 26 07:06:24 MDT 2008
> I had a similar situation with a user submitting 7,000 jobs at a time.
> Like you point out maui can't seem to keep up with scheduling all of
> them. After posting to the list it was suggested that I create a routing
> queue in torque:
> create queue physics
> set queue physics queue_type = Route
> set queue physics acl_group_enable = True
> set queue physics route_destinations = pompeii
> set queue physics enabled = True
> set queue physics started = True
> Then for the destination queue pompeii I put in the following rule:
> set queue pompeii max_queuable = 50
> This setup is working well. Torque manages to keep 50 jobs in the
> pompeii execution queue at all times. Maui is happy since it doesn't
> have to go through thousands of jobs each iteration, which it couldn't
> run anyhow due to lack of resources. (I wish we had thousands ;-)).
Please note that ANY! newer jobs that might trigger preemption will NO
LONGER WORK with this setup, since maui is only using its scheduling
algorithms on those 50 jobs.
Same with higher prio jobs or similar that will/must/should be executed
You essentially turn your setup into a "50 jobs a at time" batching system.
So, depending on your needs you should increase the max_queueable.
Before maui I managed to run a heavily patched pbs_sched (early torque
releases) with I think around 20k+ jobs queued.
After that I abandoned that setup because I needed preemption (sorry, no
docs left from that time).
I had maui running with 10k+ jobs (and changed the #define so it would
consider 8K instead of 4K jobs for real scheduling), but it's not nice
and it'll eat memory like it's sugar (500MB+ RSS).
And I still think scheduling over 8K jobs is far too less for such a system.
Because ATM I no longer have this setup in operation I did stop working
privately on maui to remedy those shortcommings.
More information about the mauiusers