[torqueusers] Email spam from maui/torque
David Jackson
jacksond at clusterresources.com
Wed Mar 23 18:38:32 MST 2005
Chris,
There are two efforts being made to address this. First, we are
rolling in some facilities used in Moab to provide exponential back-off
of failed cancel job requests in Maui. This should minimize the amount
of mail sent.
Secondly, in TORQUE, a number of sites are working to allow pbs_server
to force a job cancel and fix the original problem. We hope to see
these features rolled in in the next few months.
Dave
On Wed, 2005-03-23 at 14:25 -0700, Maestas, Christopher Daniel wrote:
> Hello,
>
> There were some posts to this awhile back that were related to excessive
> email alerts used to be send when the pbs_mom for node 0 died.
> However I think there is yet another case we've encountered. When Maui
> cannot correctly cancel a job via the following params
> ---
> #
> # ensure jobs get killed, when they should
> #
> JOBMAXOVERRUN 00:1:00
> WCVIOLATIONACTION CANCEL
> ---
>
> It will continuously try to send email to the jobid over and over and
> over.
> All the pbs_mom daemons are up for the jobid. But the following
> behaviors occur:
> canceljob JOBID - fails
> qdel JOBID - fails
> qsig -s SIGNULL - works after a couple times, or maybe once if I
> was patient!
>
> Any clues?
>
> Thanks,
>
> - Chris
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://supercluster.org/mailman/listinfo/torqueusers
More information about the torqueusers
mailing list