[torqueusers] torque-1.2.0p6 - massive emails and job nanny

Tony Vu tonyv at sdsc.edu
Mon Sep 19 17:27:56 MDT 2005


Hello,

Like some people on this list, our have users received multiple  
emails in the past when their jobs completed.  We just recently  
upgraded to patch 6 and we are still seeing this problem.  After  
browsing through this list, I read that the atttribute "job_nanny"  
needs to be turned on to alleviate this problem since by default it  
is not set.

 From what I understand, Torque will continually send multiple kill/ 
cancel/delete signals to an exiting job if for some reason it cannot  
communicate with the mother superior node on the initial try.  Is  
this correct?  If I set the job_nanny attribute to true will only the  
initial job delete signal be acknowledged and subsequent ones be  
ignored?  Is this an option that needs to be turned on before  
compiling Torque in the configure script or is support for it  
compiled in by default?

Also, is a server restart required if this server attribute is set or  
is it dynamic?

Thanks.

-----
Tony Vu
HPC Systems Engineer
San Diego Supercomputer Center
858-822-5491
tonyv at sdsc.edu





More information about the torqueusers mailing list