[Mauiusers] Re: [torqueusers] Email spam from maui/torque

David Jackson jacksond at clusterresources.com
Wed Mar 23 18:38:32 MST 2005


Chris,

  There are two efforts being made to address this.  First, we are
rolling in some facilities used in Moab to provide exponential back-off
of failed cancel job requests in Maui.  This should minimize the amount
of mail sent.

  Secondly, in TORQUE, a number of sites are working to allow pbs_server
to force a job cancel and fix the original problem.  We hope to see
these features rolled in in the next few months.

Dave 

On Wed, 2005-03-23 at 14:25 -0700, Maestas, Christopher Daniel wrote:
> Hello,
> 
> There were some posts to this awhile back that were related to excessive
> email alerts used to be send when the pbs_mom for node 0 died.
> However I think there is yet another case we've encountered.  When Maui
> cannot correctly cancel a job via the following params
> ---
> #
> # ensure jobs get killed, when they should
> #
> JOBMAXOVERRUN   00:1:00
> WCVIOLATIONACTION CANCEL
> ---
> 
> It will continuously try to send email to the jobid over and over and
> over.
> All the pbs_mom daemons are up for the jobid.  But the following
> behaviors occur:
> 	canceljob JOBID - fails
> 	qdel JOBID - fails
> 	qsig -s SIGNULL - works after a couple times, or maybe once if I
> was patient!
> 
> Any clues?
> 
> Thanks,
> 
> - Chris
> 
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://supercluster.org/mailman/listinfo/torqueusers



More information about the mauiusers mailing list