[torqueusers] Re: pbs_mom logging loads of Success(0) get_proc_stat

Michael Meier Michael.Meier at rrze.uni-erlangen.de
Mon Dec 10 13:25:52 MST 2007


>>> Mom logs stuff like that:
>>>> 12/07/2007 00:04:10;0001;   pbs_mom;Svr;pbs_mom;Success (0) in 
>>>> cput_sum, 7058: get_proc_stat
>>> the mom tries to parse that line in the following way (from 
>>> torque-2.3.0-snap.200712061242/src/resmom/linux/mom_mach.c):
>>> fscanf(fd,"%d (%[^)]) %c %d %d %d
>>> That will probably break on parsing the '(ib_fmr(mthca0))'
>>> The only proper fix would probably be to look for the last ')' in the 
>>> whole string.
>> And here's my suggestion for a patch. Patchfile is against torque 2.2.1.
> Egads!  An entirely non-backwards compatible problem.  That's another reason
> why IB sucks!

In what way is that non-backwards-compatible? Were there ever linux 
versions where a ')' appears in any place after the process name string? 
Unless there were, my patch in no way alters torques behaviour - except 
it no longer breaks when special characters appear in a process name.
And although I don't think it's really a good idea to use brackets in 
the name, it's still valid, you can't blame IB. It's not like the 
drivers is doing something only a kernel driver could do. Every cluster 
user could just name his binary 'hi (there)', run it and confuse torque 
with it. Linux does in no way prohibit or filter spaces, '(' or ')'.
-- 
Michael Meier, HPC Services
Friedrich-Alexander-Universitaet Erlangen-Nuernberg
Regionales Rechenzentrum Erlangen
Martensstrasse 1, 91058 Erlangen, Germany
Tel.: +49 9131 85-28973, Fax: +49 9131 302941
michael.meier at rrze.uni-erlangen.de
www.rrze.uni-erlangen.de/hpc/


More information about the torqueusers mailing list