[torqueusers] Torque set up problem: simple jobs not executing and files undelivered

Garrick Staples garrick at usc.edu
Tue Feb 12 12:20:54 MST 2008


On Tue, Feb 12, 2008 at 05:26:55PM +0000, Adil Mughal alleged:
> at no point does the job status register as "R" - it appears to be stuck in "E".
> 
> I also found that the .ER and .OU files for the jobs are not being
> delivered and are piling up in /var/spool/torque/undelivered. Here is

The errors for failing to deliver output files are emailed to the user.

Errors for failing to setup the initial job env are sent to syslog or the MOM
log.


> the content of these files as a result of running > echo "sleep 30" |
> qsub
> 
> .ER
> 
> stdin: is not a tty
> 
> and in .OU    I get
> 
> Terminal type (default=dumb) : Terminal type
> /var/spool/torque/mom_priv/jobs/94.dphpc101.SC invalid - using dumb
> You are now running on dphpc1001 in a BASH environment.

These are not TORQUE errors.  These are generated by the job, the shell, or
something else in the OS.


> Also I am using an nfs system - here is the content of my mom_priv/config file:
> 
> $pbsserver dphpc1011.dph.xxxx.xx.xx
> 
> $usecp dphpc1011.dph.xxxx.xx.xx:/home  /home

Verify that 'df' actually shows the filesystem mounted from
'dphpc1011.dph.xxxx.xx.xx:/home'.  Since you aren't using a wildcard, the exact
string must match.

 
> $logevent       255
> 
> Any ideas why the .ER and .OU files are not going to the right places??

You'll need to check the logs and the email that should have been sent.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20080212/87d26929/attachment.bin


More information about the torqueusers mailing list