[torqueusers] pbs_mom Unknown job ID
glen.beane at gmail.com
Mon Jul 14 07:54:32 MDT 2008
On Mon, Jul 14, 2008 at 9:35 AM, Jeremy Mann <jeremy at biochem.uthscsa.edu>
> I had to reboot our frontend early Sunday morning, and there were about
> 10,000 jobs in our queue. The frontend recovered from this and has been
> processing the jobs, however, all of our compute nodes will not process
> any jobs. In the mom_logs, I see:
> 07/14/2008 08:35:44;0080; pbs_mom;Req;req_reject;Reject reply
> code=15001(Unknown Job Id), aux=0, type=StatusJob, from
> PBS_Server at bcf.local
> According to pbsnodes, the compute nodes are job-exclusive but the jobs
> never run. How do I recover from this?
can you please include your torque version?
have you tried rebooting your moms?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers