[torqueusers] TORQUE 2.5.13 release candidate

Burkhard Bunk bunk at physik.hu-berlin.de
Thu Aug 1 16:21:07 MDT 2013


Hi,

the problem shows up if a queue does not have a limit on walltime
or if an array job is queried as a whole (elapsed time is undefined,
I guess).

Example 1: no limit on Elapsed Time => 00:00:00

                                               Req'd       Elap
         Job ID                  ...           Time    S   Time
         ----------------------- ----------- --------- - ---------
         4119.gargamel.physik.h  ...               --  R  00:00:00

"qstat -f" does show the correct resources_used.walltime.
The info is there, the problem is with the display by qstat on the client 
side.

Example 2: compact display of array job shows "Elap Time" = "Req'd Time".

                                               Req'd       Elap
         Job ID                  ...           Time    S   Time
         ----------------------- ----------- --------- - ---------
         60603[].irz14.physik.h  ...         720:00:00 R 720:00:00

If "Req'd Time" is unset, I see "Elap Time" = 00:00:00 .
I would prefer the behavior of 2.5.11: "Elap Time" = -- .

Explicit display of array members (qstat -at) is correct:

         60603[2].irz14.physik.  ...         720:00:00 R  08:00:25


All that is easy to understand, given the fact that qstat.c in 2.5.13 
tries to compute

 	elap_time = req_walltime - rem_walltime

which doesn't work if "Requested Time" or "Remaining Time" is undefined
(falls back to zero).

Regards,
Burkhard Bunk.
----------------------------------------------------------------------
  bunk at physik.hu-berlin.de      Physics Institute, Humboldt University
  fax:    ++49-30 2093 7628     Newtonstr. 15
  phone:  ++49-30 2093 7980     12489 Berlin, Germany
----------------------------------------------------------------------

On Thu, 1 Aug 2013, Ken Nielson wrote:

> Burkhard,
> 
> I looked at the output of qstat -a and the elapsed time seems to be working as
> expected to me. What are you getting and what do you expect?
> 
> Ken
> 
> On Thu, Aug 1, 2013 at 11:09 AM, Burkhard Bunk <bunk at physik.hu-berlin.de>
> wrote:
>       Hi,
>
>       unfortunately the behavior of "Elapsed Time" in "qstat -a" is
>       still
>       slightly screwed up, as reported two days ago.
>
>       The problem is in src/cmds/qstat.c:
>       "eltimewal" was replaced by "elap_time_string", which is computed
>       from
>       "rem_walltime" in
>
>           878     if ((*jstate != 'Q') && (*jstate != 'C'))
>           879       {
>           880       elap_time = req_walltime - rem_walltime;
>           881       time_to_string(elap_time_string, elap_time);
>           882       }
>
>       This will fail if "req_walltime" is not available.
>
>       I fixed it by reverting to "eltimewal" in the print statements
>       (lines 904 and 924), and it works for me.
>
>       This is a preliminary hack, not a real fix. I'm sure that the
>       "elap_time_string" mechanism was introduced for a reason (which I
>       don't understand), and it should be fixed accordingly.
>
>       Regards,
>       Burkhard Bunk.
>       ----------------------------------------------------------------------
>        bunk at physik.hu-berlin.de      Physics Institute, Humboldt
>       University
>        fax:    ++49-30 2093 7628     Newtonstr. 15
>        phone:  ++49-30 2093 7980     12489 Berlin, Germany
>       ----------------------------------------------------------------------
>
>       On Wed, 31 Jul 2013, Ken Nielson wrote:
>
>             Hi all,
>
>             I have fixed the last reported but with 2.5.13. The
>             download is available
>             through GitHub. You can download it with the following
>             command.
>
>             git clone
>             https://github.com/adaptivecomputing/torque.git -b
>             2.5.13 2.5.13
>
>             Please let us know right away of any major problems.
>
>             Regards
>
>             --
>             Ken Nielson
>             +1 801.717.3700 office +1 801.717.3738 fax
>             1712 S. East Bay Blvd, Suite 300  Provo, UT  84606
>             www.adaptivecomputing.com
> 
> 
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
> 
> 
> 
> 
> --
> Ken Nielson
> +1 801.717.3700 office +1 801.717.3738 fax
> 1712 S. East Bay Blvd, Suite 300  Provo, UT  84606
> www.adaptivecomputing.com
> 
> 
>


More information about the torqueusers mailing list