[torqueusers] excessive emails
musson at clusterresources.com
Thu Jun 30 16:16:55 MDT 2005
Maui has been updated so that if a job is in the exiting state (E), Maui
will no longer try to cancel the job. This should resolve the issue of
thousands of emails going out when Maui tries to kill a job that is
already exiting. This change is present in the latest patch14 snapshot.
On Mon, 2005-06-27 at 09:45 -0700, Garrick Staples wrote:
> On Mon, Jun 27, 2005 at 12:50:35PM +0200, Roy Dragseth alleged:
> > On Monday 27 June 2005 08:02, Garrick Staples wrote:
> > > This is already filed in bugzilla #61. The general idea is that maui is
> > > telling pbs_server to kill a job, but for whatever reason pbs_mom isn't
> > > doing it. The problem is that users are getting an email each time;
> > > possibly hundreds of emails.
> > >
> > > Does anyone have any good ideas on how pbs_server can be smarter about
> > > this?
> > >
> > > I'm thinking that a generalized mail rate limiter can be with a new
> > > "minimum time between emails per job" server attribute. pbs_server could
> > > record the timestamp of the last email sent in a new job attribute and
> > > refuse to send emails if enough time hasn't elapsed yet. It is a simple,
> > > easily understood mechanism that is trivially coded, but could easily
> > > discard useful email.
> > >
> > > Maybe we could also record the last "reason" and take that into account.
> > > Maybe we could keep counters for each type of email. I don't know.
> > >
> > > Anyone else have any ideas?
> > I really like the way maui handles this by letting one specify a notification
> > program that takes care of handling the report. Then I can customize who
> > gets what messages and so on.
> That's certainly something that could be done. Do you have any scripts that
> ratelimit messages? Maybe we could adapt
> torqueusers mailing list
> torqueusers at supercluster.org
More information about the torqueusers