[Torqueusers] watch the job being done

Brock Palen brockp at umich.edu
Wed Aug 23 12:07:02 MDT 2006


>
Look at the qpeek tool from pbstools at:
http://www.osc.edu/~troy/pbs/

This will allow you to look at the STDOUT and STDERR files held on  
the mother node.
Brock Palen
Center for Advanced Computing
brockp at umich.edu
(734)936-1985


> I posted a similar comment/workarround last month or so. In summary  
> this
> is what I do:
>
> in my job script (coded in bash  #!/bin/bash in the first line)
>
>
> # Run the parallel MPI executable using mpiexec
> MPIEXEC_CMD="/usr/local/bin/mpiexec $EXE"
>
> # redirect (stdout) and (stderr) to files in user directory
> $MPIEXEC_CMD 1> output."${PBS_JOBID%%.*}" 2>error."${PBS_JOBID%%.*}"
>
> # or combined (stdout) abd (stderr)
> #$MPIEXEC_CMD 2>&1 output."${PBS_JOBID%%.*}"
>
>
> That way you don't have to wait until the job is done and torque  
> copies
> the log file from the spool back to the user's directory. The  
> output files
> will be directly in the users' directories as error.JOBID and
> output.JOBID, or combined output.JOIBID
>
> I had to do this because some jobs (RAMS model) were dying but for  
> some
> reason the process wasn't killed and it remained sleeping, so my users
> were waiting for the output file forever and didn't know the job had
> failed.
>
>
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Tue, 22 Aug 2006 12:44:23 -0600
> From: Garrick Staples <garrick at clusterresources.com>
> Subject: Re: [torqueusers] watch the job being done
> To: torqueusers at supercluster.org
> Message-ID: <20060822184423.GD4305 at login>
> Content-Type: text/plain; charset=us-ascii
>
> On Tue, Aug 22, 2006 at 07:41:15AM -1000, Donald Tripp alleged:
>> So, in order to see the output from a job, you must either use a  
>> command
>> to peek at the spools, or in your program design in some other output
>> system, such as reporting to the syslog.
>
> Or use qsub -k which keeps the output file(s) in your homedir.
>
>
>
> ------------------------------
>
> Message: 3
> Date: Tue, 22 Aug 2006 14:31:12 -0600
> From: Lloyd Brown <somewhere_or_other at byu.edu>
> Subject: [torqueusers] epilogue.precancel permissions
> To: torqueusers at supercluster.org
> Message-ID: <44EB6990.9090203 at byu.edu>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hey all,
>
> We're trying to do some testing to implement the use of
> epilogue.precancel scripts
> (http://www.clusterresources.com/wiki/doku.php? 
> id=torque:appendix:g_prologue_and_epilogue_scripts)
> on Torque 2.1.2.  According to the documentation, it needs to be
> executable by root, and not be writable by anyone else.  Is this
> accurate?  I've been trying several different variations on test  
> scripts
> to no avail.  I've even tried leaving the script blank, with just the
> "#!/bin/bash" line at the top.  In all cases, I get an entry like the
> following in the mom log:
>
>> 08/22/2006 14:20:52;0001;   pbs_mom;Svr;pbs_mom;run_pelog, prolog/ 
>> epilog failed, file: /usr/spool/PBS/mom_priv/epilogue.precancel,  
>> exit: -1, Permission Error
>> 08/22/2006 14:20:52;0001;   pbs_mom;Svr;pbs_mom;kill_job,  
>> precancel epilog failed
>
> Here are the permissions on the file:
>
>> # ls -l /usr/spool/PBS/mom_priv/epilogue.precancel
>> -r-x------  1 root root 121 Aug 22 14:19 /usr/spool/PBS/mom_priv/ 
>> epilogue.precancel
>
>
> Any ideas?
>
> Thanks,
> Lloyd Brown
>
>
> ------------------------------
>
> Message: 4
> Date: Tue, 22 Aug 2006 15:29:36 -0600
> From: Garrick Staples <garrick at clusterresources.com>
> Subject: Re: [torqueusers] epilogue.precancel permissions
> To: torqueusers at supercluster.org
> Message-ID: <20060822212936.GF4305 at login>
> Content-Type: text/plain; charset=us-ascii
>
> On Tue, Aug 22, 2006 at 02:31:12PM -0600, Lloyd Brown alleged:
>> Hey all,
>>
>> We're trying to do some testing to implement the use of
>> epilogue.precancel scripts
>> (http://www.clusterresources.com/wiki/doku.php? 
>> id=torque:appendix:g_prologue_and_epilogue_scripts)
>> on Torque 2.1.2.  According to the documentation, it needs to be
>> executable by root, and not be writable by anyone else.  Is this
>> accurate?  I've been trying several different variations on test  
>> scripts
>> to no avail.  I've even tried leaving the script blank, with just the
>> "#!/bin/bash" line at the top.  In all cases, I get an entry like the
>> following in the mom log:
>>
>>> 08/22/2006 14:20:52;0001;   pbs_mom;Svr;pbs_mom;run_pelog, prolog/ 
>>> epilog failed, file: /usr/spool/PBS/mom_priv/epilogue.precancel,  
>>> exit: -1, Permission Error
>>> 08/22/2006 14:20:52;0001;   pbs_mom;Svr;pbs_mom;kill_job,  
>>> precancel epilog failed
>>
>> Here are the permissions on the file:
>>
>>> # ls -l /usr/spool/PBS/mom_priv/epilogue.precancel
>>> -r-x------  1 root root 121 Aug 22 14:19 /usr/spool/PBS/mom_priv/ 
>>> epilogue.precancel
>>
>>
>> Any ideas?
>
> It must also be read/exe by other.  Just use 755.
>
>
>
> ------------------------------
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
> End of torqueusers Digest, Vol 25, Issue 21
> *******************************************
>
>



More information about the torqueusers mailing list