[torqueusers] Torque 2.5.13 does not give resources_used in Accounting strings anymore?

Martin Siegert siegert at sfu.ca
Tue Oct 22 15:51:50 MDT 2013


On Tue, Oct 22, 2013 at 02:27:35PM -0700, Martin Siegert wrote:
> On Tue, Oct 22, 2013 at 11:06:41PM +0200, Burkhard Bunk wrote:
> > Hi,
> > 
> > with your findings in mind, I checked my installations. I didn't use
> > accounting so far, but scanning through the accounting files, I can 
> > confirm your observation.
> > 
> > My installations used 2.5.11 until July 2013, when I pulled 2.5.13 from
> > git and rebuilt my packages. After the update, the accounting records
> > don't contain "resources_used" clauses anymore.
> > 
> > My distribution is Debian 7 by now (32 and 64 bit), but an older server
> > is still on Debian 6 (32 bit), all with the same symptoms.
> > 
> > Regards,
> > Burkhard Bunk.
> > ----------------------------------------------------------------------
> >   bunk at physik.hu-berlin.de      Physics Institute, Humboldt University
> >   fax:    ++49-30 2093 7628     Newtonstr. 15
> >   phone:  ++49-30 2093 7980     12489 Berlin, Germany
> > ----------------------------------------------------------------------
> > 
> > On Tue, 22 Oct 2013, Grigory Shamov wrote:
> > 
> > > Hi,
> > >
> > > For some reason , our Torque 2.5 stopped reporting the used resources in $SERVER_PRIV/accounting . It has now, for the finished jobs, something like this:
> > >
> > > 10/09/2013 23:59:53;E;YYYYYYY;user=XXX group=fazioja jobname=NAME_pseudo queue=default ctime=1381327807 qtime=1381327807 etime=1381327807 start=1381360466 owner=XXX at ZZZ exec_host=n181/11 Resource_List.mem=20gb Resource_List.opsys=RHEL6 Resource_List.pmem=256mb Resource_List.procs=1 Resource_List.walltime=80:00:00 session=30801 end=1381381193 Exit_status=0
> > >
> > > The only change I can recollect was updating from 2.5.12 to 2.5.13 to address the vulnerability and mom_segfaults issues. I have built it with exactly same configure parameters (but on different CentOS version, 6 instead of 5) as before.
> > >
> > > Before I have updated it, there were things like "resources_used.cput=00:05:40 resources_used.mem=232748kb resources_used.vmem=10462620kb resources_used.walltime=00:01:10" right after the Exit_status field. Now they disappeared.
> > >
> > > Did anything changed between 2.5.12 and 2.5.13 that could cause it? Or, is there a setting that I could trip accidentally, or something like that? Does anyone run Torque 2.5.13, if yes, do you have the complete accounting strings?
> 
> I suspect that the following change is responsible:
> 
> # diff -u torque-2.5.12/src/server/req_jobobit.c torque-2.5.13/src/server/req_jobobit.c
> --- torque-2.5.12/src/server/req_jobobit.c      2011-10-05 16:20:11.000000000 -0700
> +++ torque-2.5.13/src/server/req_jobobit.c      2013-08-01 09:10:01.000000000 -0700
> @@ -2237,7 +2237,9 @@
>    char   acctbuf[RESC_USED_BUF];
>    int    accttail;
>    int    exitstatus;
> +#ifdef USESAVEDRESOURCES
>    int    have_resc_used = FALSE;
> +#endif
>    char   mailbuf[RESC_USED_BUF];
>    int    newstate;
>    int    newsubst;
> @@ -2399,10 +2401,10 @@
>  
>    accttail = strlen(acctbuf);
>  
> -  have_resc_used = get_used(patlist, acctbuf);
>  
>  #ifdef USESAVEDRESOURCES
>  
> +  have_resc_used = get_used(patlist, acctbuf);
>    /* if we don't have resources from the obit, use what the job already had */
>  
>    if (!have_resc_used)
> 
> I am guessing that that the flag -DUSESAVEDRESOURCES is missing, but
> necessary with torque-2.5.13.

I just looked at the torque-4.2.5 code and that code corresponds to the
torque-2.5.12/src/server/req_jobobit.c version. Thus, I would simply revert
the change, i.e., copy the torque-2.5.12/src/server/req_jobobit.c to
torque 2.5.13/src/server/req_jobobit.c and recompile.

Cheers,
Martin


More information about the torqueusers mailing list