[torqueusers] Torque changes node state frequently
Danny Sternkopf
dsternkopf at hpce.nec.com
Tue Aug 15 10:59:01 MDT 2006
Hi Garrick,
yes your are absolutely right. There is a daemon which is doing
'pbsnodes -r'. The behavior of that option has been changed.
Sorry, I didn't see that in the ChangeLog. That was probably meant in
version 1.2.0p6 with ' - improved pbsnodes 'offline' management', right?
Thank you very much for your quick help!
Best regards,
Danny
Garrick Staples wrote:
> On Tue, Aug 15, 2006 at 02:37:16PM +0200, Danny Sternkopf alleged:
>> Hi,
>>
>> we updated our 200 nodes cluster to Torque version 2.1.0p0. (I know it
>> is a bit outmoded meanwhile.)
>>
>> I can see that Torque is changing the node state from free/job-exclusive
>> to down and one minute laster back to the originally state.
>> This happens with all the nodes every 5-10 minutes.
>> The scheduler (Maui) doesn't like it if all resources are gone and
>> blocks all the queued jobs.
>>
>> Here an example:
> ...
>
>> 08/15/2006 14:25:53;0004;PBS_Server;node;noco120.nec;attributes set: at
>> request of root at cacau1.nec
>> 08/15/2006 14:25:53;0004;PBS_Server;node;noco120.nec;node noco120.nec
>> state changed from job-exclusive to down,job-exclusive
>> 08/15/2006 14:25:53;0004;PBS_Server;node;noco120.nec;attributes set:
>> state - offline
>> 08/15/2006 14:25:53;0004;PBS_Server;node;noco120.nec;attributes set:
>> state + down
>
> You should go ask root at cacau1.nec why he is setting the nodes down.
>
> This isn't happening from within TORQUE, you've got something external
> doing this.
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
--
Danny Sternkopf dsternkopf at hpce.nec.com
High Performance Computing Europe GmbH http://www.hpce.nec.com
Stuttgart, Germany phone: +49-711-68770-35 fax: +49-711-6877145
More information about the torqueusers
mailing list