[torqueusers] jobs not beeing scheduled but many free slots
arnaubria at pic.es
Sat Jan 3 12:12:47 MST 2009
On Sat, 03 Jan 2009 18:29:06 +0000
Craig Macdonald wrote:
> Aside: Often the lists require moderation, and I guess the moderator
> is on holiday ;-)
could be, but I have submited in the past with no problem... anyway,
I'll ask after holidays...
> I think the problem is that while the node is free, the loadavg on
> the node suggests otherwise:
> pbsnodes reports
> maui reports
> Load: 3.170
> (I guess a time difference in your grabs there).
Yes, that could be, not borh values looks similar:
ALERT: node is in state Idle but load is high (3.000)
> There must be other processes running on the node - either not
> managed by torque, or that havent been killed properly by torque
> (some people use kill epilogue scripts).
now is node td240, that node has 2 jobs:
# pbsnodes td240.pic.es
state = free
np = 4
properties = slc4,magic
ntype = cluster
jobs = 1/1663053.pbs02.pic.es, 3/1663054.pbs02.pic.es
and I don't see any orphan process...
could the problem be produced by regular jobs?
is a wn with 3 of load enough problem for blocking the
> You probably want to configure maui parameter NODEALLOCATIONPOLICY -
I had somethuing at toruqe level, but IIRC now is commented out...
anyway, our treshold was 5, not 3, and worked really fine..
Thanks for the reply,
More information about the torqueusers