[torqueusers] maui + torque job start rate
Ling C. Ho
ling at fnal.gov
Wed Apr 1 08:33:46 MDT 2009
Our single-np jobs take 3-6 seconds to start, when there are already jobs running on the worker
nodes (configured as 2-8 nps). Even using qrun -a, it still take close to 2 seconds to start.
Starting jobs on batch of "free" worker nodes is really fast, but most of the time we have some jobs
already running on the worker nodes. Would Moab Asyncstart help in this case? Do the jobs actually
get started, or are they just being pushed to Torque in a higher rate.
Josh Butikofer wrote:
> First of all, what are the average size of these jobs? Are they single node jobs, or is there a good mix between parallel and single node jobs? A parallel job will take a bit longer to start-up due to the sisters needing to be contacted by the mother superior, etc.
> Yeah, Moab's ASYNCSTART option really does help. There are a few other options that can also give a speed boost. In our best tests, Moab & TORQUE can start 50 jobs/sec. I haven't tried the same benchmark with Maui. I'll look through my benchmark setup to see if there are more options/tweaks that Maui can take advantage of.
> Josh Butikofer
> Cluster Resources, Inc.
> ----- "Stijn De Weirdt" <stijn.deweirdt at ugent.be> wrote:
>> hi all,
>> (this is a crosspost to both maui and torque users list)
>> we are having issues with the job start rate using maui+torque.
>> a job takes on average 2 seconds, which is slow for what our users
>> dumping in our queues.
>> with a job start i mean the following cycle
>> 04/01 10:01:08 MRMJobStart(374900,Msg,SC)
>> 04/01 10:01:08 MPBSJobStart(374900,gengar,Msg,SC)
>> 04/01 10:01:08
>> 04/01 10:01:10 MPBSJobModify(374900,Resource_List,Resource,1)
>> 04/01 10:01:10 INFO: job '374900' successfully started
>> 04/01 10:01:10 INFO: command sent to server
>> 04/01 10:01:10 INFO: response received from server
>> i've already tried to follow the "large cluster" tuning tips to see
>> it helps, but no real result. (the only tip that might solve the
>> problemn is the asyncstart option from moab ;). (we have a 200 node,
>> core/node cluster (i actually don't think this is "large"))
>> anyway, before i dig in the code looking for options, i'm wondering
>> other people are seeing as minimal start time, so i know if it is
>> possible at all.
>> many thanks,
>> The system will shutdown in 5 minutes.
>> torqueusers mailing list
>> torqueusers at supercluster.org
> torqueusers mailing list
> torqueusers at supercluster.org
More information about the torqueusers