[torqueusers] nodes hung (or not?)
Ronny T. Lampert
telecaadmin at gmail.com
Thu Sep 21 03:20:50 MDT 2006
>>Has anyone seen this? This is the first time in over 10 months of use with
>>torque (and 3 with maui). If it happens again hopefully I can check more
>>logs and get a better idea.
>>Any help is greatly appreciated.
> I can't think of a scenerio that fits this description.
But I can!
First make sure you are running the latest torque or at least 2.1.1 which
fixed a lot of bugs.
Then - I had problems with the maui-side of things.
I had the problem of maui not recognizing free nodes sometimes OR the retry
to find a free node was too high.
Also, in maui-3.2.6p16-snap.1155916970.tar.gz there are a couple of timers
set to a shorter value e.g. with "deferred" jobs etc so they will be
re-considered faster for scheduling.
Retry it with this snapshot and compile against the actual torque-version
you are using (the snapshot also fixed issues with preemption).
If you don't need maui features like preemption or reservation then simple
go with the pbs_sched which has served me well for over 2 years with
millions of jobs.
More information about the torqueusers