[torqueusers] Re: pbs_mom logging loads of Success(0) get_proc_stat

Garrick Staples garrick at usc.edu
Mon Dec 10 11:40:46 MST 2007


On Fri, Dec 07, 2007 at 06:28:25PM +0100, Michael Meier alleged:
> >Mom logs stuff like that:
> >>12/07/2007 00:04:10;0001;   pbs_mom;Svr;pbs_mom;Success (0) in 
> >>cput_sum, 7058: get_proc_stat
> >the mom tries to parse that line in the following way (from 
> >torque-2.3.0-snap.200712061242/src/resmom/linux/mom_mach.c):
> >fscanf(fd,"%d (%[^)]) %c %d %d %d
> >That will probably break on parsing the '(ib_fmr(mthca0))', because it 
> >will assume the first ')' is the closing bracket. Which is just not true.
> >'man 5 proc' suggests to use '%s', but that will be even worse than the 
> >current '%[^)]', breaking on every executable name that contains a 
> >space. And what if someone wants run a monster like the following:
> >6849 (te (s)( ))t)) S 25614 6849 25614 34838 6849 4194304 161 0 0 0 0 0 
> >0 0 20 0 1 0 36168980 2564096 77 18446744073709551615 4194304 4195956 
> >140736421683184 18446744073709551615 47252866936498 0 0 0 0 0 0 0 17 0 0 
> >0 0
> >The only proper fix would probably be to look for the last ')' in the 
> >whole string.
> 
> And here's my suggestion for a patch. Patchfile is against torque 2.2.1.

Egads!  An entirely non-backwards compatible problem.  That's another reason
why IB sucks!


We'll need some run-time detection.  I suggest something like this:
  patt1="original scanf patt"
  patt2="awful IB patt
  patt=patt1

  ...
  if (scanf(patt) == failed)
    if (scanf(patt1) != failed)
      patt=patt1
    elseif (scanf(patt2) != failed)
      patt=patt2
  

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20071210/3c7075b2/attachment.bin


More information about the torqueusers mailing list