[torqueusers] Question to Torque community regarding display of completed jobs in qstat
Craig Tierney - NOAA Affiliate
craig.tierney at noaa.gov
Mon Dec 3 13:15:55 MST 2012
On Mon, Dec 3, 2012 at 1:09 PM, Ezell, Matthew A. <ezellma at ornl.gov> wrote:
> On 12/2/12 1:24 PM, "Craig Tierney - NOAA Affiliate" <
> craig.tierney at noaa.gov<mailto:craig.tierney at noaa.gov>> wrote:
> Hello all,
> I have a question for Torque users regarding the display of completed jobs
> in qstat. Do others find the listing of completed jobs by default via
> qstat makes finding things in the output much more difficult and completely
> unnecessary? Having the completed jobs in qstat can significantly slow
> down qstat if you have a lot (thousands) of completed jobs which is another
> I asking this because I need to be able to get error codes from completed
> jobs (for minutes to hours after completion). To do this, they have to
> still be in the queue. This function is very important, but not to anyone
> who runs qstat by hand. Grid Engine had a way to get completed jobs, but
> only when asked for.
> Users can run 'qstat -r' to get a list of running jobs or 'qstat -i' to
> get a list of queued/held/waiting jobs.
The above it true. It would be nice if you could combine these options.
> My understanding is that once a job has been completed for more than
> keep_completed seconds, pbs_server forgets about it. Then, you have to go
> look in the logs.
Yes, and I would like to keep the jobs for one day. That would leave
40-50k jobs in completed state. A qstat with 20k completed jobs (from a
test on a slow server) showed the qstat time went to about 8 seconds.
Alternatively, you could setup an epilogue to capture the exit code and
> funnel it to some user-accessible location (the job script, flat-file on a
> shared FS, database, etc).
I know I can do that, and I can ask Moab for the numbers as well. However,
the Torque server already has the information and can store it. So why
other mechanism to do this?
> Matt Ezell
> HPC Systems Administrator
> Oak Ridge National Laboratory
> torqueusers mailing list
> torqueusers at supercluster.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers