[torqueusers] Torque 2.5.1 : pbs_server crashing and other problems
torqueusers at calcua.ua.ac.be
torqueusers at calcua.ua.ac.be
Tue Jul 27 07:38:32 MDT 2010
Hello,
After having restarted the pbs_server service, I got these log messages
(server name stripped):
07/26/2010 17:00:24;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::job_recov,
/var/spool/torque/server_priv/jobs/18750.JB
appears to be from an old version. Attempting to convert.
07/26/2010
17:00:24;0001;PBS_Server;Svr;PBS_Server;LOG_ERROR::job_qs_upgrade,
backed up to
/var/spool/torque/server_priv/jobs/18750.BK
But since of then, I experience serious problems with the torque RM
service. The pbs_server daemon crashes every so many minutes, giving
strange messages. When restarting the pbs_server daemon using the
/etc/init.d script, it shows:
Shutting down TORQUE Server: [ OK ]
Starting TORQUE Server: catch_child caught pid 18319
catch_child no work task found for pid 18319
Both 'qstat' and 'checkjob' show me messages like this:
Message[12] job rejected by RM 'torque' - job started on hostlist
cn005. at time 15:21:49_07/27, job reported idle at
time 15:23:22_07/27 (see RM logs for details)
Any hints about what to do to overcome these issues?
Thanks in advance!
Franky
More information about the torqueusers
mailing list