[torqueusers] Torque 2.4.9 - job reported idle at time
Ken Nielson
knielson at adaptivecomputing.com
Tue Jul 27 17:12:34 MDT 2010
On 07/27/2010 03:08 PM, torqueusers at calcua.ua.ac.be wrote:
> 07/27/2010 22:05:19;0008;PBS_Server;Job;reply_send;Reply sent for
> request type JobObituary on socket 12
> 07/27/2010
> 22:05:19;0009;PBS_Server;Job;19000.master1.ourmachine.com;job exit
> status -3 handled
> 07/27/2010 22:05:19;0001;PBS_Server;Svr;PBS_Server;svr_setjobstate:
> setting job 19000.master1.ourmachine.com state from RUNNING-RERUN1 to
> EXITING-RERUN1 (5-61)
> 07/27/2010
> 22:05:19;0009;PBS_Server;Job;19000.master1.ourmachine.com;on_job_rerun
> task assigned to job
> 07/27/2010
> 22:05:19;0009;PBS_Server;Job;19000.master1.ourmachine.com;req_jobobit
> completed
> 07/27/2010 22:05:19;0004;PBS_Server;Svr;svr_connect;attempting connect
> to host 10.28.0.64 port 15002
> 07/27/2010 22:05:19;0001;PBS_Server;Svr;PBS_Server;svr_setjobstate:
> setting job 19000.master1.ourmachine.com state from EXITING-RERUN1 to
> EXITING-RERUN2 (5-62)
> 07/27/2010 22:05:19;0001;PBS_Server;Svr;PBS_Server;svr_setjobstate:
> setting job 19000.master1.ourmachine.com state from EXITING-RERUN2 to
> EXITING-RERUN3 (5-63)
> 07/27/2010 22:05:19;0040;PBS_Server;Req;free_nodes;freeing nodes for
> job 19000.master1.ourmachine.com
This might be something to look at. It appears job 19000 is failing with
an exit status of -3. This job is on the machines in the hostlist. The
job is then set to rerun.
Ken Nielson
Adaptive Computing
More information about the torqueusers
mailing list