[torqueusers] jobs not beeing scheduled but many free slots

Arnau Bria arnaubria at pic.es
Sat Jan 3 12:12:47 MST 2009


On Sat, 03 Jan 2009 18:29:06 +0000
Craig Macdonald wrote:

> Hi,
Hi,
 
> Aside: Often the lists require moderation, and I guess the moderator
> is on holiday ;-)
could be, but I have submited in the past with no problem... anyway,
I'll ask after holidays...


> I think the problem is that while the node is free, the loadavg on
> the node suggests otherwise:
> 
> pbsnodes reports
> 	loadave=1.64
> 
> maui reports
>       Load: 3.170
> (I guess a time difference in your grabs there).
Yes, that could be, not borh values looks similar:
pbs:

 loadave=3.25,

maui:
ALERT:  node is in state Idle but load is high (3.000)


> There must be other processes running on the node - either not
> managed by torque, or that havent been killed properly by torque
> (some people use kill epilogue scripts).

now is node td240, that node has 2 jobs:
# pbsnodes td240.pic.es
td240.pic.es
     state = free
     np = 4
     properties = slc4,magic
     ntype = cluster
     jobs = 1/1663053.pbs02.pic.es, 3/1663054.pbs02.pic.es

and I don't see any orphan process...

could the problem be produced by regular jobs?
is a wn with 3 of load enough problem for blocking the
whole system? 

> You probably want to configure maui parameter NODEALLOCATIONPOLICY -
> see
> http://www.clusterresources.com/products/maui/docs/5.2nodeallocation.shtml
I had somethuing at toruqe level, but IIRC now is commented out...
anyway, our treshold was 5, not 3, and worked really fine..

> Craig
Thanks for the reply,
Arnau


More information about the torqueusers mailing list