[torqueusers] jobs not beeing scheduled but many free slots
Arnau Bria
arnaubria at pic.es
Sat Jan 3 12:12:47 MST 2009
On Sat, 03 Jan 2009 18:29:06 +0000
Craig Macdonald wrote:
> Hi,
Hi,
> Aside: Often the lists require moderation, and I guess the moderator
> is on holiday ;-)
could be, but I have submited in the past with no problem... anyway,
I'll ask after holidays...
> I think the problem is that while the node is free, the loadavg on
> the node suggests otherwise:
>
> pbsnodes reports
> loadave=1.64
>
> maui reports
> Load: 3.170
> (I guess a time difference in your grabs there).
Yes, that could be, not borh values looks similar:
pbs:
loadave=3.25,
maui:
ALERT: node is in state Idle but load is high (3.000)
> There must be other processes running on the node - either not
> managed by torque, or that havent been killed properly by torque
> (some people use kill epilogue scripts).
now is node td240, that node has 2 jobs:
# pbsnodes td240.pic.es
td240.pic.es
state = free
np = 4
properties = slc4,magic
ntype = cluster
jobs = 1/1663053.pbs02.pic.es, 3/1663054.pbs02.pic.es
and I don't see any orphan process...
could the problem be produced by regular jobs?
is a wn with 3 of load enough problem for blocking the
whole system?
> You probably want to configure maui parameter NODEALLOCATIONPOLICY -
> see
> http://www.clusterresources.com/products/maui/docs/5.2nodeallocation.shtml
I had somethuing at toruqe level, but IIRC now is commented out...
anyway, our treshold was 5, not 3, and worked really fine..
> Craig
Thanks for the reply,
Arnau
More information about the torqueusers
mailing list