[Mauiusers] Re: [torqueusers] maui + torque job start rate

Ling C. Ho ling at fnal.gov
Thu Apr 9 11:35:54 MDT 2009


Argh, I recreated your patch by hand, and didn't noticed you had changed "MasterHost" to "HostList" 
in the pbs_asyrunjob call. This all make sense now, and it works beautifully on my test setup.

Thank you all!

...
ling

Tom Rudwick wrote:

> For the async call the hostlist is passed in. I guess if someone is making
> the changes configurable, they would have to choose one method or the other
> for the synchronous call style, either compatible with the current 
> sequence,
> or the faster method that eliminates the MPBSJobModify calls.
> 
> Josh Butikofer wrote:
>> Actually, I just checked out the Maui source code and it looks like 
>> you will need to keep at least one of the neednodes calls (the one 
>> before the call to pbs_runjob()), as Maui is not passing a host list 
>> into pbs_runjob(). If Maui does pass in the hostlist to pbs_runjob(), 
>> the neednodes calls are probably not needed.
>>
>> Josh Butikofer
>> Cluster Resources, Inc.
>> #############################
>>
>>
>> Josh Butikofer wrote:
>>> Tom is right that the "neednodes" modification is no longer needed 
>>> for newer versions of TORQUE. You should, in fact, be able to remove 
>>> any MPBSJobModify() code that changes "neednodes". I don't have the 
>>> Maui code in front of me, but if both MPBSJobModify() calls deal with 
>>> neednodes, you should be able to safely remove both of them if using 
>>> a newer version of TORQUE.
>>>
>>> Josh Butikofer
>>> Cluster Resources, Inc.
>>> #############################
>>>
>>>
>>> Ling C. Ho wrote:
>>>> Yes, I meant the second MPBSJobModify, not MPBSJobStart. So if I 
>>>> need maui to still assign the nodes for me (using 
>>>> NODEALLOCATIONPOLICY), could I still use both MPBSJobModify()'s, and 
>>>> just change pbs_runjob() to pbs_asyrunjob()?
>>>>
>>>> Thanks for your quick reply.
>>>>
>>>> ...
>>>> ling
>>>>
>>>>
>>>>
>>>> Tom Rudwick wrote:
>>>>
>>>>> If you mean the second MPBSJobModify, my understanding is that that 
>>>>> call was supposed
>>>>> to work around an old bug in PBS.
>>>>>
>>>>> Tom
>>>>>
>>>>>
>>>>> Ling C. Ho wrote:
>>>>>> Hi Tom,
>>>>>>
>>>>>> In your patch, you have commented out both MPBSJobModify calls 
>>>>>> before and after pbs_asystart(). I can understand the first 
>>>>>> MPBSJobStart() which set the node where the job should run. What 
>>>>>> is the purpose of the second MPBSJobStart as it set the neednodes 
>>>>>> to 1?
>>>>>>
>>>>>> Thanks,
>>>>>> ...
>>>>>> ling
>>>>>>
>>>>>> Tom Rudwick wrote:
>>>>>>
>>>>>>> If you search the maui list archives for my asynchronous job 
>>>>>>> start patch
>>>>>>> you can increase that speed greatly.
>>>>>>>
>>>>>>> Tom
>>>>>>>
>>>>>>>
>>>>>>> Stijn De Weirdt wrote:
>>>>>>>> hi all,
>>>>>>>>
>>>>>>>> (this is a crosspost to both maui and torque users list)
>>>>>>>>
>>>>>>>> we are having issues with the job start rate using maui+torque. 
>>>>>>>> starting
>>>>>>>> a job takes on average 2 seconds, which is slow for what our 
>>>>>>>> users are
>>>>>>>> dumping in our queues.
>>>>>>>>
>>>>>>>> with a job start i mean the following cycle
>>>>>>>> 04/01 10:01:08 MRMJobStart(374900,Msg,SC)
>>>>>>>> 04/01 10:01:08 MPBSJobStart(374900,gengar,Msg,SC)
>>>>>>>> 04/01 10:01:08
>>>>>>>> MPBSJobModify(374900,Resource_List,Resource,node088.gengar.gent.vsc) 
>>>>>>>>
>>>>>>>> 04/01 10:01:10 MPBSJobModify(374900,Resource_List,Resource,1)
>>>>>>>> 04/01 10:01:10 INFO:     job '374900' successfully started
>>>>>>>> 04/01 10:01:10 INFO:     command sent to server
>>>>>>>> 04/01 10:01:10 INFO:     response received from server
>>>>>>>>
>>>>>>>> i've already tried to follow the "large cluster" tuning tips to 
>>>>>>>> see if
>>>>>>>> it helps, but no real result. (the only tip that might solve the
>>>>>>>> problemn is the asyncstart option from moab ;). (we have a 200 
>>>>>>>> node, 8
>>>>>>>> core/node cluster (i actually don't think this is "large"))
>>>>>>>>
>>>>>>>> anyway, before i dig in the code looking for options, i'm 
>>>>>>>> wondering what
>>>>>>>> other people are seeing as minimal start time, so i know if it is
>>>>>>>> possible at all.
>>>>>>>>
>>>>>>>> many thanks,
>>>>>>>>
>>>>>>>> stijn
>>>>>>>>   
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> torqueusers mailing list
>>>>>>> torqueusers at supercluster.org
>>>>>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> mauiusers mailing list
>>>> mauiusers at supercluster.org
>>>> http://www.supercluster.org/mailman/listinfo/mauiusers
>>> _______________________________________________
>>> mauiusers mailing list
>>> mauiusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/mauiusers
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
> 




More information about the torqueusers mailing list