[torquedev] [Bug 173] New: [torque-3.0.4] pbs_mom buffer overflow / segfaults when using --enable-nvidia-gpus [with BUG FIX]

bugzilla-daemon at supercluster.org bugzilla-daemon at supercluster.org
Tue Jan 31 20:24:35 MST 2012


http://www.clusterresources.com/bugzilla/show_bug.cgi?id=173

           Summary: [torque-3.0.4] pbs_mom buffer overflow / segfaults
                    when using --enable-nvidia-gpus [with BUG FIX]
           Product: TORQUE
           Version: 3.0.x
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: enhancement
          Priority: P5
         Component: pbs_mom
        AssignedTo: knielson at adaptivecomputing.com
        ReportedBy: nicolas.pinto at gmail.com
                CC: torquedev at supercluster.org
   Estimated Hours: 0.0


There is a buffer overflow in pbs_mom when using --enable-nvidia-gpus and a
large number of GPUs (e.g. 8):

Report:
-------
# with $loglevel 7
$ tail /var/log/messages
Jan 31 22:04:56 munctional6 pbs_mom: LOG_DEBUG::check_nvidia_version_file,
Nvidia driver info: NVRM version: NVIDIA UNIX x86_64 Kernel Module 290.10 Wed
Nov 16 17:39:29 PST 2011
Jan 31 22:04:56 munctional6 pbs_mom: LOG_DEBUG::gpus, gpus: GPU cmd issued:
nvidia-smi -q -x 2>&1
Jan 31 22:04:59 munctional6 kernel: [ 4186.963718] pbs_mom[7497] general
protection ip:41dd69 sp:7fff2ccf72b8 error:0 in pbs_mom[400000+56000]

Cause:
------
src/resmom/mom_server.c:2507 only allocates 16 KB whereas the output of
"nvidia-smi -q -x 2>&1" is ~24KB

Bug fix:
--------
The following simple patch fixes the issue: 

--- src/resmom/mom_server.c.orig        2012-01-12 17:18:49.000000000 -0500
+++ src/resmom/mom_server.c     2012-01-31 22:24:01.179534519 -0500
@@ -2504,7 +2504,7 @@
   static char id[] = "generate_server_gpustatus_smi";

   char *dataptr, *outptr, *tmpptr1, *tmpptr2, *savptr;
-  char gpu_string[16 * 1024];
+  char gpu_string[32 * 1024];
   int  gpu_modes[32];
   int  have_modes = FALSE;
   int  gpuid = -1;

HTH

Nicolas

-- 
Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


More information about the torquedev mailing list