[torqueusers] Question to Torque community regarding display of completed jobs in qstat

Craig Tierney - NOAA Affiliate craig.tierney at noaa.gov
Mon Dec 3 13:15:55 MST 2012


On Mon, Dec 3, 2012 at 1:09 PM, Ezell, Matthew A. <ezellma at ornl.gov> wrote:

> On 12/2/12 1:24 PM, "Craig Tierney - NOAA Affiliate" <
> craig.tierney at noaa.gov<mailto:craig.tierney at noaa.gov>> wrote:
>
> Hello all,
>
> I have a question for Torque users regarding the display of completed jobs
> in qstat.  Do others find the listing of completed jobs by default via
> qstat makes finding things in the output much more difficult and completely
> unnecessary?  Having the completed jobs in qstat can significantly slow
> down qstat if you have a lot (thousands) of completed jobs which is another
> hassle.
>
> I asking this because I need to be able to get error codes from completed
> jobs (for minutes to hours after completion).  To do this, they have to
> still be in the queue.  This function is very important, but not to anyone
> who runs qstat by hand.  Grid Engine had a way to get completed jobs, but
> only when asked for.
>
> Thanks,
> Craig
>
> Users can run 'qstat -r' to get a list of running jobs or 'qstat -i' to
> get a list of queued/held/waiting jobs.
>
>
Matt,

The above it true.  It would be nice if you could combine these options.


> My understanding is that once a job has been completed for more than
> keep_completed seconds, pbs_server forgets about it.  Then, you have to go
> look in the logs.
>
>
Yes, and I would like to keep the jobs for one day.  That would leave
40-50k jobs in completed state.  A qstat with 20k completed jobs (from a
test on a slow server) showed the qstat time went to about 8 seconds.

Alternatively, you could setup an epilogue to capture the exit code and
> funnel it to some user-accessible location (the job script, flat-file on a
> shared FS, database, etc).
>
>
I know I can do that, and I can ask Moab for the numbers as well.  However,
the Torque server already has the information and can store it.  So why
build some
other mechanism to do this?

Thanks,
Craig

~Matt
>
> ---
> Matt Ezell
> HPC Systems Administrator
> Oak Ridge National Laboratory
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20121203/6b03d908/attachment-0001.html 


More information about the torqueusers mailing list