[torqueusers] nodes hung (or not?)

Ronny T. Lampert telecaadmin at gmail.com
Thu Sep 21 03:20:50 MDT 2006


>>Has anyone seen this?  This is the first time in over 10 months of use with
>>torque (and 3 with maui).  If it happens again hopefully I can check more
>>logs and get a better idea.
>>Any help is greatly appreciated.
>
> I can't think of a scenerio that fits this description.

But I can!
First make sure you are running the latest torque or at least 2.1.1 which
fixed a lot of bugs.

Then - I had problems with the maui-side of things.

I had the problem of maui not recognizing free nodes sometimes OR the retry
to find a free node was too high.
Also, in maui-3.2.6p16-snap.1155916970.tar.gz there are a couple of timers
set to a shorter value e.g. with "deferred" jobs etc so they will be
re-considered faster for scheduling.
Retry it with this snapshot and compile against the actual torque-version
you are using (the snapshot also fixed issues with preemption).

If you don't need maui features like preemption or reservation then simple
go with the pbs_sched which has served me well for over 2 years with
millions of jobs.


Cheers,
Ronny



More information about the torqueusers mailing list