[torqueusers] Torque changes node state frequently

Garrick Staples garrick at clusterresources.com
Tue Aug 15 10:24:26 MDT 2006


On Tue, Aug 15, 2006 at 02:37:16PM +0200, Danny Sternkopf alleged:
> Hi,
> 
> we updated our 200 nodes cluster to Torque version 2.1.0p0. (I know it 
> is a bit outmoded meanwhile.)
> 
> I can see that Torque is changing the node state from free/job-exclusive 
> to down and one minute laster back to the originally state.
> This happens with all the nodes every 5-10 minutes.
> The scheduler (Maui) doesn't like it if all resources are gone and 
> blocks all the queued jobs.
> 
> Here an example:
...

> 08/15/2006 14:25:53;0004;PBS_Server;node;noco120.nec;attributes set:  at 
> request of root at cacau1.nec
> 08/15/2006 14:25:53;0004;PBS_Server;node;noco120.nec;node noco120.nec 
> state changed from job-exclusive to down,job-exclusive
> 08/15/2006 14:25:53;0004;PBS_Server;node;noco120.nec;attributes set: 
> state - offline
> 08/15/2006 14:25:53;0004;PBS_Server;node;noco120.nec;attributes set: 
> state + down

You should go ask root at cacau1.nec why he is setting the nodes down.

This isn't happening from within TORQUE, you've got something external
doing this.



More information about the torqueusers mailing list