[torqueusers] pbs_mom trying to kill job that does not exist

Roger Moye moye at rice.edu
Tue May 12 09:26:40 MDT 2009


The point of my "qdel -p" example was simply to show how the problem can 
be reproduced on demand.   The problem that is of most concern to me is 
when the problem happens by itself and is undetected.  In some cases a 
job will exit the system but for some reason the mom doesn't know and 
keeps trying to kill the job.  Eventually this will crash the torque 
server.  That's the problem I'd like to avoid.


Roger Moye
Linux Cluster Administrator
TeraGrid Campus Champion
Rice University
Dept. of Academic and Research Computing
Research Computing Support Group
(713) 348-5756
moye at rice.edu

More information about the torqueusers mailing list