[torqueusers] Torque 2.5.13 does not give resources_used in Accounting strings anymore?

Grigory Shamov gas5x at yahoo.com
Wed Oct 23 08:48:37 MDT 2013


Dear Martin,
Dear David,

Thanks a lot for your help!

--
Grigory Shamov

--------------------------------------------
On Tue, 10/22/13, David Beer <dbeer at adaptivecomputing.com> wrote:

 Subject: Re: [torqueusers] Torque 2.5.13 does not give resources_used in Accounting strings anymore?
 To: "Torque Users Mailing List" <torqueusers at supercluster.org>
 Date: Tuesday, October 22, 2013, 3:36 PM
 
 It appears that this is a
 bug that crept into the 2.5 source. Martin is correct that
 this change should simply be reverted to fix the bug.
 David
 
 
 On Tue, Oct 22, 2013
 at 3:51 PM, Martin Siegert <siegert at sfu.ca>
 wrote:
 
 On Tue, Oct 22, 2013 at 02:27:35PM
 -0700, Martin Siegert wrote:
 
 > On Tue, Oct 22, 2013 at 11:06:41PM +0200, Burkhard Bunk
 wrote:
 
 > > Hi,
 
 > >
 
 > > with your findings in mind, I checked my
 installations. I didn't use
 
 > > accounting so far, but scanning through the
 accounting files, I can
 
 > > confirm your observation.
 
 > >
 
 > > My installations used 2.5.11 until July 2013, when
 I pulled 2.5.13 from
 
 > > git and rebuilt my packages. After the update, the
 accounting records
 
 > > don't contain "resources_used"
 clauses anymore.
 
 > >
 
 > > My distribution is Debian 7 by now (32 and 64
 bit), but an older server
 
 > > is still on Debian 6 (32 bit), all with the same
 symptoms.
 
 > >
 
 > > Regards,
 
 > > Burkhard Bunk.
 
 > >
 ----------------------------------------------------------------------
 
 > >   bunk at physik.hu-berlin.de
      Physics Institute, Humboldt University
 
 > >   fax:    ++49-30 2093
 7628     Newtonstr. 15
 
 > >   phone:  ++49-30 2093
 7980     12489 Berlin, Germany
 
 > >
 ----------------------------------------------------------------------
 
 > >
 
 > > On Tue, 22 Oct 2013, Grigory Shamov wrote:
 
 > >
 
 > > > Hi,
 
 > > >
 
 > > > For some reason , our Torque 2.5 stopped
 reporting the used resources in $SERVER_PRIV/accounting . It
 has now, for the finished jobs, something like this:
 
 > > >
 
 > > > 10/09/2013 23:59:53;E;YYYYYYY;user=XXX
 group=fazioja jobname=NAME_pseudo queue=default
 ctime=1381327807 qtime=1381327807 etime=1381327807
 start=1381360466 owner=XXX at ZZZ exec_host=n181/11
 Resource_List.mem=20gb Resource_List.opsys=RHEL6
 Resource_List.pmem=256mb Resource_List.procs=1
 Resource_List.walltime=80:00:00 session=30801 end=1381381193
 Exit_status=0
 
 
 > > >
 
 > > > The only change I can recollect was updating
 from 2.5.12 to 2.5.13 to address the vulnerability and
 mom_segfaults issues. I have built it with exactly same
 configure parameters (but on different CentOS version, 6
 instead of 5) as before.
 
 
 > > >
 
 > > > Before I have updated it, there were things
 like "resources_used.cput=00:05:40
 resources_used.mem=232748kb resources_used.vmem=10462620kb
 resources_used.walltime=00:01:10" right after the
 Exit_status field. Now they disappeared.
 
 
 > > >
 
 > > > Did anything changed between 2.5.12 and
 2.5.13 that could cause it? Or, is there a setting that I
 could trip accidentally, or something like that? Does anyone
 run Torque 2.5.13, if yes, do you have the complete
 accounting strings?
 
 
 >
 
 > I suspect that the following change is responsible:
 
 >
 
 > # diff -u torque-2.5.12/src/server/req_jobobit.c
 torque-2.5.13/src/server/req_jobobit.c
 
 > --- torque-2.5.12/src/server/req_jobobit.c    
  2011-10-05 16:20:11.000000000 -0700
 
 > +++ torque-2.5.13/src/server/req_jobobit.c    
  2013-08-01 09:10:01.000000000 -0700
 
 > @@ -2237,7 +2237,9 @@
 
 >    char   acctbuf[RESC_USED_BUF];
 
 >    int    accttail;
 
 >    int    exitstatus;
 
 > +#ifdef USESAVEDRESOURCES
 
 >    int    have_resc_used = FALSE;
 
 > +#endif
 
 >    char   mailbuf[RESC_USED_BUF];
 
 >    int    newstate;
 
 >    int    newsubst;
 
 > @@ -2399,10 +2401,10 @@
 
 >
 
 >    accttail = strlen(acctbuf);
 
 >
 
 > -  have_resc_used = get_used(patlist, acctbuf);
 
 >
 
 >  #ifdef USESAVEDRESOURCES
 
 >
 
 > +  have_resc_used = get_used(patlist, acctbuf);
 
 >    /* if we don't have resources from the obit,
 use what the job already had */
 
 >
 
 >    if (!have_resc_used)
 
 >
 
 > I am guessing that that the flag -DUSESAVEDRESOURCES is
 missing, but
 
 > necessary with torque-2.5.13.
 
 
 
 I just looked at the torque-4.2.5 code and that
 code corresponds to the
 
 torque-2.5.12/src/server/req_jobobit.c version. Thus, I
 would simply revert
 
 the change, i.e., copy the
 torque-2.5.12/src/server/req_jobobit.c to
 
 torque 2.5.13/src/server/req_jobobit.c and recompile.
 
 
 
 Cheers,
 
 Martin
 
 _______________________________________________
 
 torqueusers mailing list
 
 torqueusers at supercluster.org
 
 http://www.supercluster.org/mailman/listinfo/torqueusers
 
 
 
 
 -- 
 David Beer | Senior Software
 EngineerAdaptive Computing
 
 
 -----Inline Attachment Follows-----
 
 _______________________________________________
 torqueusers mailing list
 torqueusers at supercluster.org
 http://www.supercluster.org/mailman/listinfo/torqueusers
 


More information about the torqueusers mailing list