[torquedev] Torque (server) bug in parsing nodecounts

Michael Meier Michael.Meier at rrze.uni-erlangen.de
Thu May 14 02:25:36 MDT 2009


[I would have preferred to handle this via bugzilla, but it's still down]
One of our users recently triggered a bad bug in torque. What she did was to 
submit a jobscript requesting 128128128128[repeat 128 times]128128 
nodes, i.e. a number of nodes with 384 digits, due to hitting a few wrong 
keys in 'vi'.
The result of this tiny user error however was pretty bad: Instead of just 
rejecting it, the torque server accepted the job, then segfaulted. Each 
subsequent attempt to restart the torque server resulted in a segfault again, 
because it had already written the .JB file to its 'jobs' dir and crashed 
when attempting to reread it.
I've since been able to have a look at the reason for this crash: the static 
function "number" in src/server/node_manager.c. The function looks like it 
was written by a 6 year old on crack doing his first C program. No sanity 
or boundary checks are performed, input is just happily copied over the stack 
when it's too long.
Attached you will find a quick patch against yesterdays 2.3.7 snapshot that 
fixes at least the worst errors in this function - and prevents the crash in 
the case mentioned above.
However, the whole function as it is is probably redundant - it seems it is 
just a (crappy!) reimplementation of strto(u)l. At least for a 
non-hotfix-release it would probably be better to use that instead of 
reinventing the wheel.
-- 
Michael Meier, HPC Services
Friedrich-Alexander-Universitaet Erlangen-Nuernberg
Regionales Rechenzentrum Erlangen
Martensstrasse 1, 91058 Erlangen, Germany
Tel.: +49 9131 85-28973, Fax: +49 9131 302941
michael.meier at rrze.uni-erlangen.de
www.rrze.uni-erlangen.de/hpc/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: torque-nodemanager.patch
Type: text/x-diff
Size: 554 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20090514/c5253717/attachment.bin 


More information about the torquedev mailing list