[torqueusers] pbs_mom Unknown job ID

Glen Beane glen.beane at gmail.com
Mon Jul 14 08:18:59 MDT 2008


On Mon, Jul 14, 2008 at 10:01 AM, Jeremy Mann <jeremy at biochem.uthscsa.edu>
wrote:

>
> Glen Beane wrote:
> > On Mon, Jul 14, 2008 at 9:35 AM, Jeremy Mann <jeremy at biochem.uthscsa.edu
> >
> > wrote:
> >
> >> I had to reboot our frontend early Sunday morning, and there were about
> >> 10,000 jobs in our queue. The frontend recovered from this and has been
> >> processing the jobs, however, all of our compute nodes will not process
> >> any jobs. In the mom_logs, I see:
> >>
> >> 07/14/2008 08:35:44;0080;   pbs_mom;Req;req_reject;Reject reply
> >> code=15001(Unknown Job Id), aux=0, type=StatusJob, from
> >> PBS_Server at bcf.local
> >>
> >> According to pbsnodes, the compute nodes are job-exclusive but the jobs
> >> never run. How do I recover from this?
> >
> >
> >
> > can you please include your torque version?
> >
> > have you tried rebooting your moms?
>
> I fixed it Glen, thanks for the response. The jobs that were assigned to
> the compute nodes needed to be deleted, then the moms began to accept new
> jobs. This is just something I'll need to keep in mind the next time I
> have to reboot the frontend node.



sounds like a bug to me.   What version of TORQUE are you using?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20080714/c84e0be7/attachment-0001.html


More information about the torqueusers mailing list