[torqueusers] Torque 2.4.9 - job reported idle at time

Ken Nielson knielson at adaptivecomputing.com
Tue Jul 27 17:12:34 MDT 2010


On 07/27/2010 03:08 PM, torqueusers at calcua.ua.ac.be wrote:
> 07/27/2010 22:05:19;0008;PBS_Server;Job;reply_send;Reply sent for 
> request type JobObituary on socket 12
> 07/27/2010 
> 22:05:19;0009;PBS_Server;Job;19000.master1.ourmachine.com;job exit 
> status -3 handled
> 07/27/2010 22:05:19;0001;PBS_Server;Svr;PBS_Server;svr_setjobstate: 
> setting job 19000.master1.ourmachine.com state from RUNNING-RERUN1 to 
> EXITING-RERUN1 (5-61)
> 07/27/2010 
> 22:05:19;0009;PBS_Server;Job;19000.master1.ourmachine.com;on_job_rerun 
> task assigned to job
> 07/27/2010 
> 22:05:19;0009;PBS_Server;Job;19000.master1.ourmachine.com;req_jobobit 
> completed
> 07/27/2010 22:05:19;0004;PBS_Server;Svr;svr_connect;attempting connect 
> to host 10.28.0.64 port 15002
> 07/27/2010 22:05:19;0001;PBS_Server;Svr;PBS_Server;svr_setjobstate: 
> setting job 19000.master1.ourmachine.com state from EXITING-RERUN1 to 
> EXITING-RERUN2 (5-62)
> 07/27/2010 22:05:19;0001;PBS_Server;Svr;PBS_Server;svr_setjobstate: 
> setting job 19000.master1.ourmachine.com state from EXITING-RERUN2 to 
> EXITING-RERUN3 (5-63)
> 07/27/2010 22:05:19;0040;PBS_Server;Req;free_nodes;freeing nodes for 
> job 19000.master1.ourmachine.com 
This might be something to look at. It appears job 19000 is failing with 
an exit status of -3. This job is on the machines in the hostlist. The 
job is then set to rerun.

Ken Nielson
Adaptive Computing


More information about the torqueusers mailing list