[torqueusers] error in torque 1.2.0p6

Dave Jackson jacksond at clusterresources.com
Thu Jan 12 23:25:08 MST 2006


Mr Tony,

  I think your first step would be to upgrade to the latest TORQUE (ie
2.0.0p5).  Garrick contributed several patches to improve the stability
of pbs_sched.  Your second step may be to upgrade off of pbs_sched.
Please let us know if this fixes the instability.

Dave 

On Thu, 2006-01-12 at 17:43 -0800, Mr Tony Ling wrote:
>  Hi,
> 
>     I have 128 nodes cluster running torque 1.2.0p6 . Everytime when
> the user submit a batch of jobs, the torque scheduler will  terminated
> itself and come with following error in the log file. Then the users
> can't submit any more jobs, unless the torque scheduler is been
> restarted again.
> 
> PBS_Server;Connection refused (111) in contact_sched, Could not
> contact Scheduler - port 15004 
> 01/12/2006 09:58:46;0001;PBS_Server;Svr;PBS_Server;Connection refused
> (111) in contact_sched, Could not contact Scheduler - port 15004
> 
>       I have to write a cron job to check the health of torque
> scheduler process, if it is dealth then start it again.
> 
>      Any helpful people please help me in this. Thanks.
> 
> 
> 
> ______________________________________________________________________
> Yahoo! Photos
> Got holiday prints? See all the ways to get quality prints in your
> hands ASAP.
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list