[torqueusers] pbs_mom Unknown job ID

Jeremy Mann jeremy at biochem.uthscsa.edu
Mon Jul 14 07:35:32 MDT 2008


I had to reboot our frontend early Sunday morning, and there were about
10,000 jobs in our queue. The frontend recovered from this and has been
processing the jobs, however, all of our compute nodes will not process
any jobs. In the mom_logs, I see:

07/14/2008 08:35:44;0080;   pbs_mom;Req;req_reject;Reject reply
code=15001(Unknown Job Id), aux=0, type=StatusJob, from
PBS_Server at bcf.local

According to pbsnodes, the compute nodes are job-exclusive but the jobs
never run. How do I recover from this?


-- 
Jeremy Mann
jeremy at biochem.uthscsa.edu

University of Texas Health Science Center
Bioinformatics Core Facility
http://www.bioinformatics.uthscsa.edu
Phone: (210) 567-2672



More information about the torqueusers mailing list