[torqueusers] Email spam from maui/torque
jacksond at clusterresources.com
Wed Mar 23 18:38:32 MST 2005
There are two efforts being made to address this. First, we are
rolling in some facilities used in Moab to provide exponential back-off
of failed cancel job requests in Maui. This should minimize the amount
of mail sent.
Secondly, in TORQUE, a number of sites are working to allow pbs_server
to force a job cancel and fix the original problem. We hope to see
these features rolled in in the next few months.
On Wed, 2005-03-23 at 14:25 -0700, Maestas, Christopher Daniel wrote:
> There were some posts to this awhile back that were related to excessive
> email alerts used to be send when the pbs_mom for node 0 died.
> However I think there is yet another case we've encountered. When Maui
> cannot correctly cancel a job via the following params
> # ensure jobs get killed, when they should
> JOBMAXOVERRUN 00:1:00
> WCVIOLATIONACTION CANCEL
> It will continuously try to send email to the jobid over and over and
> All the pbs_mom daemons are up for the jobid. But the following
> behaviors occur:
> canceljob JOBID - fails
> qdel JOBID - fails
> qsig -s SIGNULL - works after a couple times, or maybe once if I
> was patient!
> Any clues?
> - Chris
> torqueusers mailing list
> torqueusers at supercluster.org
More information about the torqueusers