[Mauiusers] maui stops scheduling when finds a non real busy node

"Hung-Sheng Tsao (Lao Tsao 老曹) Ph.D." laotsao at gmail.com
Wed Sep 14 08:39:57 MDT 2011


please post your configuration file of maui and your torque setup


On 9/14/2011 10:32 AM, Arnau Bria wrote:
> Hi all,
>
> We have updated our torque version to 2.5.8 recently, but, as I see
> this is a maui issue, I first ask here.
>
> our combo is :
> # rpm -qa|egrep 'maui-server|torque-server'
> maui-server-3.3-1.x86_64
> torque-server-2.5.8-1.cri.x86_64
>
> Maui works fine, but in a schedule cycle, if it finds a node in busy
> status, it does not schedule any other job in that cyle:
>
> 09/14 16:20:48 INFO:     job '20420500' successfully started
> 09/14 16:20:48 MRMJobStart(20420265,Msg,SC)
> 09/14 16:20:48 MPBSJobStart(20420265,base,Msg,SC)
> 09/14 16:20:48 ERROR:    job '20420265' cannot be started: (rc: 15046  errmsg: 'Resource temporarily unavailable REJHOST=td578.pic.es MSG=cannot allocate node 'td578.pic.es' to job - node not currently available (state: busy)'  hostlist: 'td578.pic.es')
> 09/14 16:20:48 ERROR:    cannot start job '20420265' in partition DEFAULT
> 09/14 16:20:48 MJobPReserve(20420265,DEFAULT,ResCount,ResCountRej)
> 09/14 16:20:48 INFO:     no priority reservations created (bf/rsv policy)
> 09/14 16:20:48 MRMJobStart(20420306,Msg,SC)
> 09/14 16:20:48 MPBSJobStart(20420306,base,Msg,SC)
> 09/14 16:20:48 ERROR:    job '20420306' cannot be started: (rc: 15046  errmsg: 'Resource temporarily unavailable REJHOST=td578.pic.es MSG=cannot allocate node 'td578.pic.es' to job - node not currently available (state: busy)'  hostlist: 'td578.pic.es')
> 09/14 16:20:48 ERROR:    cannot start job '20420306' in partition DEFAULT
> 09/14 16:20:48 MJobPReserve(20420306,DEFAULT,ResCount,ResCountRej)
> 09/14 16:20:48 INFO:     no priority reservations created (bf/rsv policy)
> 09/14 16:20:48 MRMJobStart(20420268,Msg,SC)
>
>
>
> torque says that the node is busy:
>
> 09/14/2011 03:00:13;0008;PBS_Server;Job;20401629.pbs03.pic.es;could not locate requested resources 'td578.pic.es' (node_spec failed) cannot allocate node 'td578.pic.es' to job - node not currently available (state: busy)
> 09/14/2011 03:00:13;0080;PBS_Server;Req;req_reject;Reject reply code=15046(Resource temporarily unavailable REJHOST=td578.pic.es MSG=cannot allocate node 'td578.pic.es' to job - node not currently available (state: busy)), aux=0, type=RunJob, from root at pbs03.pic.es
> 09/14/2011 03:00:13;0008;PBS_Server;Job;20401630.pbs03.pic.es;could not locate requested resources 'td578.pic.es' (node_spec failed) cannot allocate node 'td578.pic.es' to job - node not currently available (state: busy)
>
>
> but that the node is not real "busy". It's only busy for
> few seconds becasue, after I see the error (delay of 3-4 seconds), I do a
> pbsnodes $nodename and I see it free.
> On the next scheduling cycle, if it does not find any "busy" node, all jobs are scheduled.
>
>
> I'm wondering if I could configure maui to bypass those failing nodes
> and keep scheduling other jobs while I guess why torque mark those
> nodes as busy if they are not.
>
>
> TIA,
> Arnau
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers
-------------- next part --------------
A non-text attachment was scrubbed...
Name: laotsao.vcf
Type: text/x-vcard
Size: 642 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/mauiusers/attachments/20110914/9194e18a/attachment-0001.vcf 


More information about the mauiusers mailing list