[torqueusers] pbs_mom Unknown job ID

Jeremy Mann jeremy at biochem.uthscsa.edu
Mon Jul 14 08:01:03 MDT 2008


Glen Beane wrote:
> On Mon, Jul 14, 2008 at 9:35 AM, Jeremy Mann <jeremy at biochem.uthscsa.edu>
> wrote:
>
>> I had to reboot our frontend early Sunday morning, and there were about
>> 10,000 jobs in our queue. The frontend recovered from this and has been
>> processing the jobs, however, all of our compute nodes will not process
>> any jobs. In the mom_logs, I see:
>>
>> 07/14/2008 08:35:44;0080;   pbs_mom;Req;req_reject;Reject reply
>> code=15001(Unknown Job Id), aux=0, type=StatusJob, from
>> PBS_Server at bcf.local
>>
>> According to pbsnodes, the compute nodes are job-exclusive but the jobs
>> never run. How do I recover from this?
>
>
>
> can you please include your torque version?
>
> have you tried rebooting your moms?

I fixed it Glen, thanks for the response. The jobs that were assigned to
the compute nodes needed to be deleted, then the moms began to accept new
jobs. This is just something I'll need to keep in mind the next time I
have to reboot the frontend node.



-- 
Jeremy Mann
jeremy at biochem.uthscsa.edu

University of Texas Health Science Center
Bioinformatics Core Facility
http://www.bioinformatics.uthscsa.edu
Phone: (210) 567-2672



More information about the torqueusers mailing list