[torqueusers] pbs_mom Unknown job ID

Glen Beane glen.beane at gmail.com
Mon Jul 14 07:54:32 MDT 2008


On Mon, Jul 14, 2008 at 9:35 AM, Jeremy Mann <jeremy at biochem.uthscsa.edu>
wrote:

> I had to reboot our frontend early Sunday morning, and there were about
> 10,000 jobs in our queue. The frontend recovered from this and has been
> processing the jobs, however, all of our compute nodes will not process
> any jobs. In the mom_logs, I see:
>
> 07/14/2008 08:35:44;0080;   pbs_mom;Req;req_reject;Reject reply
> code=15001(Unknown Job Id), aux=0, type=StatusJob, from
> PBS_Server at bcf.local
>
> According to pbsnodes, the compute nodes are job-exclusive but the jobs
> never run. How do I recover from this?



can you please include your torque version?

have you tried rebooting your moms?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20080714/0e5161e5/attachment.html


More information about the torqueusers mailing list