[torqueusers] Question to Torque community regarding display of completed jobs in qstat

Craig Tierney - NOAA Affiliate craig.tierney at noaa.gov
Mon Dec 3 13:15:55 MST 2012

On Mon, Dec 3, 2012 at 1:09 PM, Ezell, Matthew A. <ezellma at ornl.gov> wrote:

> On 12/2/12 1:24 PM, "Craig Tierney - NOAA Affiliate" <
> craig.tierney at noaa.gov<mailto:craig.tierney at noaa.gov>> wrote:
> Hello all,
> I have a question for Torque users regarding the display of completed jobs
> in qstat.  Do others find the listing of completed jobs by default via
> qstat makes finding things in the output much more difficult and completely
> unnecessary?  Having the completed jobs in qstat can significantly slow
> down qstat if you have a lot (thousands) of completed jobs which is another
> hassle.
> I asking this because I need to be able to get error codes from completed
> jobs (for minutes to hours after completion).  To do this, they have to
> still be in the queue.  This function is very important, but not to anyone
> who runs qstat by hand.  Grid Engine had a way to get completed jobs, but
> only when asked for.
> Thanks,
> Craig
> Users can run 'qstat -r' to get a list of running jobs or 'qstat -i' to
> get a list of queued/held/waiting jobs.

The above it true.  It would be nice if you could combine these options.

> My understanding is that once a job has been completed for more than
> keep_completed seconds, pbs_server forgets about it.  Then, you have to go
> look in the logs.
Yes, and I would like to keep the jobs for one day.  That would leave
40-50k jobs in completed state.  A qstat with 20k completed jobs (from a
test on a slow server) showed the qstat time went to about 8 seconds.

Alternatively, you could setup an epilogue to capture the exit code and
> funnel it to some user-accessible location (the job script, flat-file on a
> shared FS, database, etc).
I know I can do that, and I can ask Moab for the numbers as well.  However,
the Torque server already has the information and can store it.  So why
build some
other mechanism to do this?


> ---
> Matt Ezell
> HPC Systems Administrator
> Oak Ridge National Laboratory
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20121203/6b03d908/attachment-0001.html 

More information about the torqueusers mailing list