[Mauiusers] Re: [torqueusers] excessive emails

Garrick Staples garrick at usc.edu
Thu Jun 30 18:28:17 MDT 2005


On Thu, Jun 30, 2005 at 04:16:55PM -0600, Michael Musson alleged:
> All,
> 
> Maui has been updated so that if a job is in the exiting state (E), Maui
> will no longer try to cancel the job.  This should resolve the issue of
> thousands of emails going out when Maui tries to kill a job that is
> already exiting.  This change is present in the latest patch14 snapshot.

I don't think this is going to solve the problem.  When I've seen these loops
it's because the mom superiour isn't responding.  In that case, the job is
never put into E state because the mom isn't around to tell pbs_server that the
job is exiting.


> Mike M.
> 
> On Mon, 2005-06-27 at 09:45 -0700, Garrick Staples wrote:
> > On Mon, Jun 27, 2005 at 12:50:35PM +0200, Roy Dragseth alleged:
> > > On Monday 27 June 2005 08:02, Garrick Staples wrote:
> > > > This is already filed in bugzilla #61.  The general idea is that maui is
> > > > telling pbs_server to kill a job, but for whatever reason pbs_mom isn't
> > > > doing it.  The problem is that users are getting an email each time;
> > > > possibly hundreds of emails.
> > > >
> > > > Does anyone have any good ideas on how pbs_server can be smarter about
> > > > this?
> > > >
> > > > I'm thinking that a generalized mail rate limiter can be with a new
> > > > "minimum time between emails per job" server attribute.  pbs_server could
> > > > record the timestamp of the last email sent in a new job attribute and
> > > > refuse to send emails if enough time hasn't elapsed yet.  It is a simple,
> > > > easily understood mechanism that is trivially coded, but could easily
> > > > discard useful email.
> > > >
> > > > Maybe we could also record the last "reason" and take that into account. 
> > > > Maybe we could keep counters for each type of email.  I don't know.
> > > >
> > > > Anyone else have any ideas?
> > > 
> > > I really like the way maui handles this by letting one specify a notification 
> > > program that takes care of handling the report.   Then I can customize who 
> > > gets what messages and so on.
> > 
> > That's certainly something that could be done.  Do you have any scripts that
> > ratelimit messages?  Maybe we could adapt
> > http://dcs.nac.uci.edu/~strombrg/rate-limit.html
> > 
> > 
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
> 
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20050630/3f2c8f81/attachment.bin


More information about the torqueusers mailing list