[torqueusers] Torque 2.5.1 : pbs_server crashing and other problems

torqueusers at calcua.ua.ac.be torqueusers at calcua.ua.ac.be
Tue Jul 27 07:38:32 MDT 2010


After having restarted the pbs_server service, I got these log messages 
(server name stripped):

   07/26/2010 17:00:24;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::job_recov,
   appears to be from an old version. Attempting to convert.
   backed up to

But since of then, I experience serious problems with the torque RM 
service. The pbs_server daemon crashes every so many minutes, giving 
strange messages.  When restarting the pbs_server daemon using the 
/etc/init.d script, it shows:

   Shutting down TORQUE Server:                               [  OK  ]
   Starting TORQUE Server: catch_child caught pid 18319
   catch_child no work task found for pid 18319

Both 'qstat' and 'checkjob' show me messages like this:

   Message[12] job rejected by RM 'torque' - job started on hostlist
   cn005. at time 15:21:49_07/27, job reported idle at
   time 15:23:22_07/27 (see RM logs for details)

Any hints about what to do to overcome these issues?

Thanks in advance!


More information about the torqueusers mailing list