Bug 173

Summary: [torque-3.0.4] pbs_mom buffer overflow / segfaults when using --enable-nvidia-gpus [with BUG FIX]
Product: TORQUE Reporter: Nicolas Pinto <nicolas.pinto>
Component: pbs_momAssignee: Ken Nielson <knielson>
Status: NEW    
Severity: critical CC: dbeer, rea+maui, torquedev
Priority: P5    
Version: 3.0.x   
Hardware: PC   
OS: Linux   

Description Nicolas Pinto 2012-01-31 20:24:34 MST
There is a buffer overflow in pbs_mom when using --enable-nvidia-gpus and a
large number of GPUs (e.g. 8):

# with $loglevel 7
$ tail /var/log/messages
Jan 31 22:04:56 munctional6 pbs_mom: LOG_DEBUG::check_nvidia_version_file,
Nvidia driver info: NVRM version: NVIDIA UNIX x86_64 Kernel Module 290.10 Wed
Nov 16 17:39:29 PST 2011
Jan 31 22:04:56 munctional6 pbs_mom: LOG_DEBUG::gpus, gpus: GPU cmd issued:
nvidia-smi -q -x 2>&1
Jan 31 22:04:59 munctional6 kernel: [ 4186.963718] pbs_mom[7497] general
protection ip:41dd69 sp:7fff2ccf72b8 error:0 in pbs_mom[400000+56000]

src/resmom/mom_server.c:2507 only allocates 16 KB whereas the output of
"nvidia-smi -q -x 2>&1" is ~24KB

Bug fix:
The following simple patch fixes the issue: 

--- src/resmom/mom_server.c.orig        2012-01-12 17:18:49.000000000 -0500
+++ src/resmom/mom_server.c     2012-01-31 22:24:01.179534519 -0500
@@ -2504,7 +2504,7 @@
   static char id[] = "generate_server_gpustatus_smi";

   char *dataptr, *outptr, *tmpptr1, *tmpptr2, *savptr;
-  char gpu_string[16 * 1024];
+  char gpu_string[32 * 1024];
   int  gpu_modes[32];
   int  have_modes = FALSE;
   int  gpuid = -1;


Comment 1 Eygene Ryabinkin 2012-01-31 20:59:30 MST
Not that easy: in reality, gpus() must be fixed to respect the passed
buffer_size.  Or, better, it should do memory allocation/reallocation by itself
and return the dynamic buffer to the caller to avoid problems with incompletely
captured output from NVidia SMI tools.
Comment 2 Nicolas Pinto 2012-01-31 21:35:50 MST
Agreed. What you suggest would be the correct way to handle this. I just hacked
something in a way that is "compatible" with the current implementation.
Comment 3 Nicolas Pinto 2012-04-27 21:25:11 MDT
Any update on this front? Is 4.0.0 vulnerable to this bug?
Comment 4 David Beer 2012-04-30 09:55:06 MDT
This hasn't been re-written to use the dynamic string struct yet, but it should
happen soon.