[torqueusers] pbs_mom Unknown job ID
glen.beane at gmail.com
Mon Jul 14 08:18:59 MDT 2008
On Mon, Jul 14, 2008 at 10:01 AM, Jeremy Mann <jeremy at biochem.uthscsa.edu>
> Glen Beane wrote:
> > On Mon, Jul 14, 2008 at 9:35 AM, Jeremy Mann <jeremy at biochem.uthscsa.edu
> > wrote:
> >> I had to reboot our frontend early Sunday morning, and there were about
> >> 10,000 jobs in our queue. The frontend recovered from this and has been
> >> processing the jobs, however, all of our compute nodes will not process
> >> any jobs. In the mom_logs, I see:
> >> 07/14/2008 08:35:44;0080; pbs_mom;Req;req_reject;Reject reply
> >> code=15001(Unknown Job Id), aux=0, type=StatusJob, from
> >> PBS_Server at bcf.local
> >> According to pbsnodes, the compute nodes are job-exclusive but the jobs
> >> never run. How do I recover from this?
> > can you please include your torque version?
> > have you tried rebooting your moms?
> I fixed it Glen, thanks for the response. The jobs that were assigned to
> the compute nodes needed to be deleted, then the moms began to accept new
> jobs. This is just something I'll need to keep in mind the next time I
> have to reboot the frontend node.
sounds like a bug to me. What version of TORQUE are you using?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers