[torqueusers] maui + torque job start rate

Ling C. Ho ling at fnal.gov
Thu Apr 9 10:27:05 MDT 2009


Yes, I meant the second MPBSJobModify, not MPBSJobStart. So if I need maui to still assign the nodes 
for me (using NODEALLOCATIONPOLICY), could I still use both MPBSJobModify()'s, and just change 
pbs_runjob() to pbs_asyrunjob()?

Thanks for your quick reply.

...
ling



Tom Rudwick wrote:

> If you mean the second MPBSJobModify, my understanding is that that call 
> was supposed
> to work around an old bug in PBS.
> 
> Tom
> 
> 
> Ling C. Ho wrote:
>> Hi Tom,
>>
>> In your patch, you have commented out both MPBSJobModify calls before 
>> and after pbs_asystart(). I can understand the first MPBSJobStart() 
>> which set the node where the job should run. What is the purpose of 
>> the second MPBSJobStart as it set the neednodes to 1?
>>
>> Thanks,
>> ...
>> ling
>>
>> Tom Rudwick wrote:
>>
>>> If you search the maui list archives for my asynchronous job start patch
>>> you can increase that speed greatly.
>>>
>>> Tom
>>>
>>>
>>> Stijn De Weirdt wrote:
>>>> hi all,
>>>>
>>>> (this is a crosspost to both maui and torque users list)
>>>>
>>>> we are having issues with the job start rate using maui+torque. 
>>>> starting
>>>> a job takes on average 2 seconds, which is slow for what our users are
>>>> dumping in our queues.
>>>>
>>>> with a job start i mean the following cycle
>>>> 04/01 10:01:08 MRMJobStart(374900,Msg,SC)
>>>> 04/01 10:01:08 MPBSJobStart(374900,gengar,Msg,SC)
>>>> 04/01 10:01:08
>>>> MPBSJobModify(374900,Resource_List,Resource,node088.gengar.gent.vsc)
>>>> 04/01 10:01:10 MPBSJobModify(374900,Resource_List,Resource,1)
>>>> 04/01 10:01:10 INFO:     job '374900' successfully started
>>>> 04/01 10:01:10 INFO:     command sent to server
>>>> 04/01 10:01:10 INFO:     response received from server
>>>>
>>>> i've already tried to follow the "large cluster" tuning tips to see if
>>>> it helps, but no real result. (the only tip that might solve the
>>>> problemn is the asyncstart option from moab ;). (we have a 200 node, 8
>>>> core/node cluster (i actually don't think this is "large"))
>>>>
>>>> anyway, before i dig in the code looking for options, i'm wondering 
>>>> what
>>>> other people are seeing as minimal start time, so i know if it is
>>>> possible at all.
>>>>
>>>> many thanks,
>>>>
>>>> stijn
>>>>   
>>>
>>> _______________________________________________
>>> torqueusers mailing list
>>> torqueusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>>
> 




More information about the torqueusers mailing list