[torqueusers] error in torque 1.2.0p6

Mr Tony Ling tonylsp at yahoo.com
Thu Jan 12 18:43:07 MST 2006


 Hi,
 
     I have 128 nodes cluster running torque 1.2.0p6 . Everytime when the user submit a batch of jobs, the torque scheduler will  terminated itself and come with following error in the log file. Then the users can't submit any more jobs, unless the torque scheduler is been restarted again.
 
 PBS_Server;Connection refused (111) in contact_sched, Could not contact Scheduler - port 15004 
 01/12/2006 09:58:46;0001;PBS_Server;Svr;PBS_Server;Connection refused (111) in contact_sched, Could not contact Scheduler - port 15004
 
       I have to write a cron job to check the health of torque scheduler process, if it is dealth then start it again.
 
      Any helpful people please help me in this. Thanks.
 

			
---------------------------------
Yahoo! Photos
 Got holiday prints? See all the ways to get quality prints in your hands ASAP.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20060112/362f6cf0/attachment-0001.html


More information about the torqueusers mailing list