[torqueusers] Question to Torque community regarding display of completed jobs in qstat
Craig Tierney - NOAA Affiliate
craig.tierney at noaa.gov
Mon Dec 3 12:27:49 MST 2012
On Mon, Dec 3, 2012 at 12:12 PM, Gus Correa <gus at ldeo.columbia.edu> wrote:
> On 12/02/2012 01:24 PM, Craig Tierney - NOAA Affiliate wrote:
> > Hello all,
> > I have a question for Torque users regarding the display of completed
> > jobs in qstat. Do others find the listing of completed jobs by default
> > via qstat makes finding things in the output much more difficult and
> > completely unnecessary? Having the completed jobs in qstat can
> > significantly slow down qstat if you have a lot (thousands) of completed
> > jobs which is another hassle.
> > I asking this because I need to be able to get error codes from
> > completed jobs (for minutes to hours after completion). To do this,
> > they have to still be in the queue. This function is very important,
> > but not to anyone who runs qstat by hand. Grid Engine had a way to get
> > completed jobs, but only when asked for.
> > Thanks,
> > Craig
> Hi Craig
> Well, we keep the completed jobs on the queue for a several hours,
> qmgr -c 'set server keep_completed = ...'
> Users here never complained, and seem to like
> to see queued, running, and completed jobs.
> The old/default time of 1200 seconds was too short.
> However, our clusters and the number of users are small,
> nothing like Zeus, so the clutter caused by keeping completed
> jobs on the queue for hours is not large.
> Would 'qstat -u username' or some other filtering
> help the annoyed users?
We currently have the keep_completed to only 600 seconds, and that is too
short. We are running about 40k-50k jobs a day. While using -u username
would help, it still seems unnecessary. The jobs are not evenly
distributed between users. Some will hundreds in a single workflow (which
would be over a few hours).
I don't mind retraining users (ex: use the -u), but the first thing I would
do as a user would be write a wrapper to hide them, so I figure a better
solution is in order.
But breaking existing functionality is not usually a good idea which is why
I was looking for opinions. I already have a small patch that removes the
completed jobs, but added -c to show the completed jobs in case you care.
But if the solution isn't generally acceptable, I don't want to be
patching my code all the time.
> Gus Correa
> torqueusers mailing list
> torqueusers at supercluster.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers