[torqueusers] Question to Torque community regarding display of completed jobs in qstat
Ezell, Matthew A.
ezellma at ornl.gov
Mon Dec 3 13:09:24 MST 2012
On 12/2/12 1:24 PM, "Craig Tierney - NOAA Affiliate" <craig.tierney at noaa.gov<mailto:craig.tierney at noaa.gov>> wrote:
I have a question for Torque users regarding the display of completed jobs in qstat. Do others find the listing of completed jobs by default via qstat makes finding things in the output much more difficult and completely unnecessary? Having the completed jobs in qstat can significantly slow down qstat if you have a lot (thousands) of completed jobs which is another hassle.
I asking this because I need to be able to get error codes from completed jobs (for minutes to hours after completion). To do this, they have to still be in the queue. This function is very important, but not to anyone who runs qstat by hand. Grid Engine had a way to get completed jobs, but only when asked for.
Users can run 'qstat -r' to get a list of running jobs or 'qstat -i' to get a list of queued/held/waiting jobs.
My understanding is that once a job has been completed for more than keep_completed seconds, pbs_server forgets about it. Then, you have to go look in the logs.
Alternatively, you could setup an epilogue to capture the exit code and funnel it to some user-accessible location (the job script, flat-file on a shared FS, database, etc).
HPC Systems Administrator
Oak Ridge National Laboratory
More information about the torqueusers