[torqueusers] Torque set up problem: simple jobs not executing and files undelivered

Jerry Smith jdsmit at sandia.gov
Thu Feb 14 07:34:08 MST 2008


Are you wanting mail on the localhost, or do you want something that is 
mailed out to your domain?

We use postfix on the pbs_server  node, and then within qmgr we set
set server mail_domain = <here.com>

We also set:
set server mail_from = admin at here.com

You can also do this at compile time  via --maildomain=here.com

This gets us mail sent to the user's work related email address, instead 
of the localhost.

Jerry

Adil Mughal wrote:
> Dear Garrick and other Torque experts,
>
> I manged to get jobs running on my network - as you rightly said it
> was not a Torque problem - it was something to do with my
> .bash_profile file. I reproduce a copy of my modified .bash_profile
> file below for any other novice who may be having the same problem.
>
> Unfortunately I still cannot get the .OU and .ER messages to go to the
> right place. Furthermore - I don't know where to find the error
> messages which ought to be e-mailed to me. Typing "mail" as either
> root or non-root user does not show any messages generated by Torque.
>
> At what point in the TORQUE set up should I have configured the mail?
>
> As always many thanks for any help
>
> adil
>
>
>
>
>
>
> #
> #       .bash_profile called at start of login shell
> #
> ##source /users/PROTOUSER/profile.bash
> #
> #       Users may add their own commands here
> #
>
> if [ -f ~/.bashrc ]
> then
>         . ~/.bashrc
> fi
> ## . /users/PROTOUSER/common_profile
> bind '"\eOP": dynamic-complete-history'
> xto ()
> {
> DISPLAY=$1:0.0; export DISPLAY
> echo DISPLAY set to $DISPLAY
> }
> echo "You are now running on $HOSTNAME in a BASH environment."
> echo ""
>
>
> On Tue, Feb 12, 2008 at 7:20 PM, Garrick Staples <garrick at usc.edu> wrote:
>   
>> On Tue, Feb 12, 2008 at 05:26:55PM +0000, Adil Mughal alleged:
>>
>>     
>>> at no point does the job status register as "R" - it appears to be stuck in "E".
>>>       
>>  >
>>  > I also found that the .ER and .OU files for the jobs are not being
>>  > delivered and are piling up in /var/spool/torque/undelivered. Here is
>>
>>  The errors for failing to deliver output files are emailed to the user.
>>
>>  Errors for failing to setup the initial job env are sent to syslog or the MOM
>>  log.
>>
>>
>>
>>  > the content of these files as a result of running > echo "sleep 30" |
>>  > qsub
>>  >
>>  > .ER
>>  >
>>  > stdin: is not a tty
>>  >
>>  > and in .OU    I get
>>  >
>>  > Terminal type (default=dumb) : Terminal type
>>  > /var/spool/torque/mom_priv/jobs/94.dphpc101.SC invalid - using dumb
>>  > You are now running on dphpc1001 in a BASH environment.
>>
>>  These are not TORQUE errors.  These are generated by the job, the shell, or
>>  something else in the OS.
>>
>>
>>
>>  > Also I am using an nfs system - here is the content of my mom_priv/config file:
>>  >
>>  > $pbsserver dphpc1011.dph.xxxx.xx.xx
>>  >
>>  > $usecp dphpc1011.dph.xxxx.xx.xx:/home  /home
>>
>>  Verify that 'df' actually shows the filesystem mounted from
>>  'dphpc1011.dph.xxxx.xx.xx:/home'.  Since you aren't using a wildcard, the exact
>>  string must match.
>>
>>
>>
>>  > $logevent       255
>>  >
>>  > Any ideas why the .ER and .OU files are not going to the right places??
>>
>>  You'll need to check the logs and the email that should have been sent.
>>
>>
>> _______________________________________________
>>  torqueusers mailing list
>>  torqueusers at supercluster.org
>>  http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>>
>>     
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20080214/6fce35bb/attachment.html


More information about the torqueusers mailing list