[torqueusers] Torque 2.5.13 does not give resources_used in Accounting strings anymore?

David Beer dbeer at adaptivecomputing.com
Tue Oct 22 16:36:16 MDT 2013


It appears that this is a bug that crept into the 2.5 source. Martin is
correct that this change should simply be reverted to fix the bug.

David


On Tue, Oct 22, 2013 at 3:51 PM, Martin Siegert <siegert at sfu.ca> wrote:

> On Tue, Oct 22, 2013 at 02:27:35PM -0700, Martin Siegert wrote:
> > On Tue, Oct 22, 2013 at 11:06:41PM +0200, Burkhard Bunk wrote:
> > > Hi,
> > >
> > > with your findings in mind, I checked my installations. I didn't use
> > > accounting so far, but scanning through the accounting files, I can
> > > confirm your observation.
> > >
> > > My installations used 2.5.11 until July 2013, when I pulled 2.5.13 from
> > > git and rebuilt my packages. After the update, the accounting records
> > > don't contain "resources_used" clauses anymore.
> > >
> > > My distribution is Debian 7 by now (32 and 64 bit), but an older server
> > > is still on Debian 6 (32 bit), all with the same symptoms.
> > >
> > > Regards,
> > > Burkhard Bunk.
> > > ----------------------------------------------------------------------
> > >   bunk at physik.hu-berlin.de      Physics Institute, Humboldt University
> > >   fax:    ++49-30 2093 7628     Newtonstr. 15
> > >   phone:  ++49-30 2093 7980     12489 Berlin, Germany
> > > ----------------------------------------------------------------------
> > >
> > > On Tue, 22 Oct 2013, Grigory Shamov wrote:
> > >
> > > > Hi,
> > > >
> > > > For some reason , our Torque 2.5 stopped reporting the used
> resources in $SERVER_PRIV/accounting . It has now, for the finished jobs,
> something like this:
> > > >
> > > > 10/09/2013 23:59:53;E;YYYYYYY;user=XXX group=fazioja
> jobname=NAME_pseudo queue=default ctime=1381327807 qtime=1381327807
> etime=1381327807 start=1381360466 owner=XXX at ZZZ exec_host=n181/11
> Resource_List.mem=20gb Resource_List.opsys=RHEL6 Resource_List.pmem=256mb
> Resource_List.procs=1 Resource_List.walltime=80:00:00 session=30801
> end=1381381193 Exit_status=0
> > > >
> > > > The only change I can recollect was updating from 2.5.12 to 2.5.13
> to address the vulnerability and mom_segfaults issues. I have built it with
> exactly same configure parameters (but on different CentOS version, 6
> instead of 5) as before.
> > > >
> > > > Before I have updated it, there were things like
> "resources_used.cput=00:05:40 resources_used.mem=232748kb
> resources_used.vmem=10462620kb resources_used.walltime=00:01:10" right
> after the Exit_status field. Now they disappeared.
> > > >
> > > > Did anything changed between 2.5.12 and 2.5.13 that could cause it?
> Or, is there a setting that I could trip accidentally, or something like
> that? Does anyone run Torque 2.5.13, if yes, do you have the complete
> accounting strings?
> >
> > I suspect that the following change is responsible:
> >
> > # diff -u torque-2.5.12/src/server/req_jobobit.c
> torque-2.5.13/src/server/req_jobobit.c
> > --- torque-2.5.12/src/server/req_jobobit.c      2011-10-05
> 16:20:11.000000000 -0700
> > +++ torque-2.5.13/src/server/req_jobobit.c      2013-08-01
> 09:10:01.000000000 -0700
> > @@ -2237,7 +2237,9 @@
> >    char   acctbuf[RESC_USED_BUF];
> >    int    accttail;
> >    int    exitstatus;
> > +#ifdef USESAVEDRESOURCES
> >    int    have_resc_used = FALSE;
> > +#endif
> >    char   mailbuf[RESC_USED_BUF];
> >    int    newstate;
> >    int    newsubst;
> > @@ -2399,10 +2401,10 @@
> >
> >    accttail = strlen(acctbuf);
> >
> > -  have_resc_used = get_used(patlist, acctbuf);
> >
> >  #ifdef USESAVEDRESOURCES
> >
> > +  have_resc_used = get_used(patlist, acctbuf);
> >    /* if we don't have resources from the obit, use what the job already
> had */
> >
> >    if (!have_resc_used)
> >
> > I am guessing that that the flag -DUSESAVEDRESOURCES is missing, but
> > necessary with torque-2.5.13.
>
> I just looked at the torque-4.2.5 code and that code corresponds to the
> torque-2.5.12/src/server/req_jobobit.c version. Thus, I would simply revert
> the change, i.e., copy the torque-2.5.12/src/server/req_jobobit.c to
> torque 2.5.13/src/server/req_jobobit.c and recompile.
>
> Cheers,
> Martin
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>



-- 
David Beer | Senior Software Engineer
Adaptive Computing
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20131022/d11c70bd/attachment.html 


More information about the torqueusers mailing list