[torqueusers] Torque changes node state frequently

Danny Sternkopf dsternkopf at hpce.nec.com
Tue Aug 15 10:59:01 MDT 2006


Hi Garrick,

yes your are absolutely right. There is a daemon which is doing 
'pbsnodes -r'. The behavior of that option has been changed.

Sorry, I didn't see that in the ChangeLog. That was probably meant in 
version 1.2.0p6 with ' - improved pbsnodes 'offline' management', right?

Thank you very much for your quick help!

Best regards,

Danny

Garrick Staples wrote:
> On Tue, Aug 15, 2006 at 02:37:16PM +0200, Danny Sternkopf alleged:
>> Hi,
>>
>> we updated our 200 nodes cluster to Torque version 2.1.0p0. (I know it 
>> is a bit outmoded meanwhile.)
>>
>> I can see that Torque is changing the node state from free/job-exclusive 
>> to down and one minute laster back to the originally state.
>> This happens with all the nodes every 5-10 minutes.
>> The scheduler (Maui) doesn't like it if all resources are gone and 
>> blocks all the queued jobs.
>>
>> Here an example:
> ...
> 
>> 08/15/2006 14:25:53;0004;PBS_Server;node;noco120.nec;attributes set:  at 
>> request of root at cacau1.nec
>> 08/15/2006 14:25:53;0004;PBS_Server;node;noco120.nec;node noco120.nec 
>> state changed from job-exclusive to down,job-exclusive
>> 08/15/2006 14:25:53;0004;PBS_Server;node;noco120.nec;attributes set: 
>> state - offline
>> 08/15/2006 14:25:53;0004;PBS_Server;node;noco120.nec;attributes set: 
>> state + down
> 
> You should go ask root at cacau1.nec why he is setting the nodes down.
> 
> This isn't happening from within TORQUE, you've got something external
> doing this.
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
> 

-- 
Danny Sternkopf                         dsternkopf at hpce.nec.com
High Performance Computing Europe GmbH  http://www.hpce.nec.com
Stuttgart, Germany phone: +49-711-68770-35 fax: +49-711-6877145


More information about the torqueusers mailing list