[torqueusers] Problem with excessive and incorrect "Begun execution" mails

Andrew J Caird acaird at umich.edu
Wed Dec 21 07:13:41 MST 2005


We've seen the same thing for jobs that can't end for some reason, but 
haven't looked into it very far yet.  Has anyone else seen this, too?

Thanks for your patch for MAIL_BEGIN!

--andy

On Wed, 21 Dec 2005, Åke Sandgren wrote:

> Hi!
>
> We have been having lots of problems with excessive MAIL_BEGIN mails
> being sent to users.
>
> We have a prolog script that verifies that there is enough free space on
> a certain filesystem before allowing jobs to actually start.
> If there isn't the prolog script does exit 3 to requeue the job.
>
> This has been generating multiple MAIL_BEGIN mails being sent for the
> same jobid and annoying users alot since the server sends the MAIL_BEGIN
> mail before verifying that the mom has actually started the job.
>
> The attached patch is a first version of remeding this, it delays
> sending the MAIL_BEGIN until after having gotten the session id back
> from the mom. A quick test on 2.0.0p4 showed that it worked for my
> testcase, i.e. in mom prolog if user is me then exit 3 stopped the
> excessive MAIL_BEGIN and gave me a correct MAIL_BEGIN when the
> if-statement was removed.
>
> Please take a look and comment.
>


More information about the torqueusers mailing list