[torqueusers] Re: watch the job being done

David Corredor tecnico at nsstc.uah.edu
Tue Aug 22 12:43:06 MDT 2006


I posted a similar comment/workarround last month or so. In summary this
is what I do:

in my job script (coded in bash  #!/bin/bash in the first line)


# Run the parallel MPI executable using mpiexec
MPIEXEC_CMD="/usr/local/bin/mpiexec $EXE"

# redirect (stdout) and (stderr) to files in user directory
$MPIEXEC_CMD 1> output."${PBS_JOBID%%.*}" 2>error."${PBS_JOBID%%.*}"

# or combined (stdout) abd (stderr)
#$MPIEXEC_CMD 2>&1 output."${PBS_JOBID%%.*}"


That way you don't have to wait until the job is done and torque copies
the log file from the spool back to the user's directory. The output files
will be directly in the users' directories as error.JOBID and
output.JOBID, or combined output.JOIBID

I had to do this because some jobs (RAMS model) were dying but for some
reason the process wasn't killed and it remained sleeping, so my users
were waiting for the output file forever and didn't know the job had
failed.






More information about the torqueusers mailing list