[torqueusers] Question to Torque community regarding display of completed jobs in qstat

Craig Tierney - NOAA Affiliate craig.tierney at noaa.gov
Tue Dec 4 14:39:05 MST 2012

On Tue, Dec 4, 2012 at 11:59 AM, Ken Nielson <knielson at adaptivecomputing.com
> wrote:

> On Tue, Dec 4, 2012 at 11:54 AM, Glen Beane <glen.beane at gmail.com> wrote:
>> On Tue, Dec 4, 2012 at 1:37 PM, David Beer <dbeer at adaptivecomputing.com>
>> wrote:
>> >
>> >> Ken,
>> >>
>> >> This could work.  There are lots of things that could work.  My point
>> is
>> >> that the default behavior doesn't have any value (except it already
>> exists).
>> >> I want the users (and myself) to do as little as possible.  I asked the
>> >> question in a way I hoped would discuss if anyone else is bothered by
>> the
>> >> default behavior.  Maybe I am the only one that cares that "qstat"
>> generates
>> >> too much information in a way that I think is unnecessary.
>> >>
>> >> Craig
>> >>
>> >
>> > One case for not changing the default is that Moab and Maui both depend
>> on
>> > completed jobs appearing so that they can harvest appropriate
>> information
>> > from them. This doesn't mean we absolutely can't change it - these
>> could be
>> > made so that they request appropriately based on TORQUE versions - but
>> it
>> > does mean that if we did change it then we'd break backwards
>> compatibility
>> > with older versions of Moab/Maui, which is a significant consideration.
>> >
>> But Maui and Moab don't run the qstat executable.  What if the API
>> default were to return all jobs, including complete, but we could pass
>> a flag with the request to the server from qstat so the server knows
>> if the client wants information for completed jobs. We could add a
>> qmgr setting to change the default behavior.  qstat would include some
>> extra information that would specify "give me all jobs", "give me
>> everything but complete", or "give me the server default" (which would
>> be the default behavior for qstat).  I think most of the API calls
>> allow passing "extra" information (that may not be used by many of the
>> calls).  We might be able to use this to convey this information.
>> _______________________________________________
> Glen,
> Hmm. You are right.
> qstat always gets all of the jobs regardless of their state and then
> formats the output based on the command line switches. Even so, changing
> default behaviour is almost always problematic. What we fix for one person
> generally breaks someone else.

I had figured out most of the behavior when looking at the code.  I had a
snippet that would ask for all job states except for "C" instead of using
the default.  Then I added an option for -c and that would only pass back
completed jobs.  I didn't go into the server code and change how it worked
because I figured that would break Moab.

The reason for breaking them out is:

1) It causes (IMO) unnecessary clutter
2) If you (well I) want the completed job to be useful, the
keep_completed_jobs needs to be at least an hour, but preferably a day
2b) When you start having thousands of jobs per hour going through the
system, the number of complete jobs goes up drastically and slows down the
qstat commands when few people really care (see #1)
3) Unless I reachitect our Torque servers, users never have any access to
the information to get the exit status from the log files.  Plus that still
requires parsing ascii log files which is not efficient (where keeping the
exit code in memory is efficient).

I know it is changing default behavior and isn't something that can be done
overnight.  My point was to get others to express opinions of the current
functionality and is it really the best to do.  Maybe the change couldn't
be made until 5.0, where you have a chance to break things.  Changing it in
qstat means it never breaks the server so you don't have compatibility
issues there.


> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20121204/c3bb6daa/attachment.html 

More information about the torqueusers mailing list