[torqueusers] maui + torque job start rate
Ling C. Ho
ling at fnal.gov
Thu Apr 9 10:27:05 MDT 2009
Yes, I meant the second MPBSJobModify, not MPBSJobStart. So if I need maui to still assign the nodes
for me (using NODEALLOCATIONPOLICY), could I still use both MPBSJobModify()'s, and just change
pbs_runjob() to pbs_asyrunjob()?
Thanks for your quick reply.
...
ling
Tom Rudwick wrote:
> If you mean the second MPBSJobModify, my understanding is that that call
> was supposed
> to work around an old bug in PBS.
>
> Tom
>
>
> Ling C. Ho wrote:
>> Hi Tom,
>>
>> In your patch, you have commented out both MPBSJobModify calls before
>> and after pbs_asystart(). I can understand the first MPBSJobStart()
>> which set the node where the job should run. What is the purpose of
>> the second MPBSJobStart as it set the neednodes to 1?
>>
>> Thanks,
>> ...
>> ling
>>
>> Tom Rudwick wrote:
>>
>>> If you search the maui list archives for my asynchronous job start patch
>>> you can increase that speed greatly.
>>>
>>> Tom
>>>
>>>
>>> Stijn De Weirdt wrote:
>>>> hi all,
>>>>
>>>> (this is a crosspost to both maui and torque users list)
>>>>
>>>> we are having issues with the job start rate using maui+torque.
>>>> starting
>>>> a job takes on average 2 seconds, which is slow for what our users are
>>>> dumping in our queues.
>>>>
>>>> with a job start i mean the following cycle
>>>> 04/01 10:01:08 MRMJobStart(374900,Msg,SC)
>>>> 04/01 10:01:08 MPBSJobStart(374900,gengar,Msg,SC)
>>>> 04/01 10:01:08
>>>> MPBSJobModify(374900,Resource_List,Resource,node088.gengar.gent.vsc)
>>>> 04/01 10:01:10 MPBSJobModify(374900,Resource_List,Resource,1)
>>>> 04/01 10:01:10 INFO: job '374900' successfully started
>>>> 04/01 10:01:10 INFO: command sent to server
>>>> 04/01 10:01:10 INFO: response received from server
>>>>
>>>> i've already tried to follow the "large cluster" tuning tips to see if
>>>> it helps, but no real result. (the only tip that might solve the
>>>> problemn is the asyncstart option from moab ;). (we have a 200 node, 8
>>>> core/node cluster (i actually don't think this is "large"))
>>>>
>>>> anyway, before i dig in the code looking for options, i'm wondering
>>>> what
>>>> other people are seeing as minimal start time, so i know if it is
>>>> possible at all.
>>>>
>>>> many thanks,
>>>>
>>>> stijn
>>>>
>>>
>>> _______________________________________________
>>> torqueusers mailing list
>>> torqueusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>>
>
More information about the torqueusers
mailing list