[Mauiusers] Re: [torqueusers] excessive emails
garrick at usc.edu
Thu Jun 30 18:28:17 MDT 2005
On Thu, Jun 30, 2005 at 04:16:55PM -0600, Michael Musson alleged:
> Maui has been updated so that if a job is in the exiting state (E), Maui
> will no longer try to cancel the job. This should resolve the issue of
> thousands of emails going out when Maui tries to kill a job that is
> already exiting. This change is present in the latest patch14 snapshot.
I don't think this is going to solve the problem. When I've seen these loops
it's because the mom superiour isn't responding. In that case, the job is
never put into E state because the mom isn't around to tell pbs_server that the
job is exiting.
> Mike M.
> On Mon, 2005-06-27 at 09:45 -0700, Garrick Staples wrote:
> > On Mon, Jun 27, 2005 at 12:50:35PM +0200, Roy Dragseth alleged:
> > > On Monday 27 June 2005 08:02, Garrick Staples wrote:
> > > > This is already filed in bugzilla #61. The general idea is that maui is
> > > > telling pbs_server to kill a job, but for whatever reason pbs_mom isn't
> > > > doing it. The problem is that users are getting an email each time;
> > > > possibly hundreds of emails.
> > > >
> > > > Does anyone have any good ideas on how pbs_server can be smarter about
> > > > this?
> > > >
> > > > I'm thinking that a generalized mail rate limiter can be with a new
> > > > "minimum time between emails per job" server attribute. pbs_server could
> > > > record the timestamp of the last email sent in a new job attribute and
> > > > refuse to send emails if enough time hasn't elapsed yet. It is a simple,
> > > > easily understood mechanism that is trivially coded, but could easily
> > > > discard useful email.
> > > >
> > > > Maybe we could also record the last "reason" and take that into account.
> > > > Maybe we could keep counters for each type of email. I don't know.
> > > >
> > > > Anyone else have any ideas?
> > >
> > > I really like the way maui handles this by letting one specify a notification
> > > program that takes care of handling the report. Then I can customize who
> > > gets what messages and so on.
> > That's certainly something that could be done. Do you have any scripts that
> > ratelimit messages? Maybe we could adapt
> > http://dcs.nac.uci.edu/~strombrg/rate-limit.html
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
> mauiusers mailing list
> mauiusers at supercluster.org
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20050630/3f2c8f81/attachment.bin
More information about the torqueusers