[torquedev] [Bug 99] New: qsub crashes with -W option and specific number of chars

bugzilla-daemon at supercluster.org bugzilla-daemon at supercluster.org
Fri Nov 12 07:30:21 MST 2010


http://www.clusterresources.com/bugzilla/show_bug.cgi?id=99

           Summary: qsub crashes with -W option and specific number of
                    chars
           Product: TORQUE
           Version: 2.5.x
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: major
          Priority: P5
         Component: clients
        AssignedTo: glen.beane at gmail.com
        ReportedBy: maarten at sara.nl
                CC: torquedev at supercluster.org
   Estimated Hours: 0.0


Hi,

When I run qsub with the -W option, it crashes with the following error:

*** glibc detected *** qsub: double free or corruption (!prev):
0x000000000b9cd950 ***
======= Backtrace: =========
/lib64/libc.so.6[0x347ee7230f]
/lib64/libc.so.6(cfree+0x4b)[0x347ee7276b]
qsub[0x402a12]
qsub[0x402da4]
qsub[0x404670]
qsub[0x406a2c]
qsub[0x40702a]
qsub[0x4076ee]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x347ee1d994]
qsub[0x402559]
======= Memory map: ========
00400000-0040b000 r-xp 00000000 08:02 3228554                           
/usr/bin/qsub
0060a000-0060c000 rw-p 0000a000 08:02 3228554                           
/usr/bin/qsub
0060c000-0060e000 rw-p 0060c000 00:00 0 
0b9cd000-0b9ee000 rw-p 0b9cd000 00:00 0                                  [heap]
347ea00000-347ea1c000 r-xp 00000000 08:02 6537318                       
/lib64/ld-2.5.so
347ec1b000-347ec1c000 r--p 0001b000 08:02 6537318                       
/lib64/ld-2.5.so
347ec1c000-347ec1d000 rw-p 0001c000 08:02 6537318                       
/lib64/ld-2.5.so
347ee00000-347ef4e000 r-xp 00000000 08:02 6537331                       
/lib64/libc-2.5.so
347ef4e000-347f14d000 ---p 0014e000 08:02 6537331                       
/lib64/libc-2.5.so
347f14d000-347f151000 r--p 0014d000 08:02 6537331                       
/lib64/libc-2.5.so
347f151000-347f152000 rw-p 00151000 08:02 6537331                       
/lib64/libc-2.5.so
347f152000-347f157000 rw-p 347f152000 00:00 0 
3480600000-348060d000 r-xp 00000000 08:02 6537355                       
/lib64/libgcc_s-4.1.2-20080825.so.1
348060d000-348080d000 ---p 0000d000 08:02 6537355                       
/lib64/libgcc_s-4.1.2-20080825.so.1
348080d000-348080e000 rw-p 0000d000 08:02 6537355                       
/lib64/libgcc_s-4.1.2-20080825.so.1
2af4638d2000-2af4638d4000 rw-p 2af4638d2000 00:00 0 
2af4638d4000-2af4638fe000 r-xp 00000000 08:02 3244368                   
/usr/lib/libtorque.so.2.0.0
2af4638fe000-2af463afe000 ---p 0002a000 08:02 3244368                   
/usr/lib/libtorque.so.2.0.0
2af463afe000-2af463b00000 rw-p 0002a000 08:02 3244368                   
/usr/lib/libtorque.so.2.0.0
2af463b00000-2af463be5000 rw-p 2af463b00000 00:00 0 
2af463bfb000-2af463bfc000 rw-p 2af463bfb000 00:00 0 
7fff42ff8000-7fff43021000 rw-p 7ffffffd5000 00:00 0                     
[stack]
ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0                  [vdso]
Aborted


This happens if and only if I use the following in the script (or as -W on the
commandline)

#PBS -W
stagein=CREAM798436615_jobWrapper.sh at gb-ce-ams.els.sara.nl:/opt/glite/var/cream_sandbox/ops/_O_dutchgrid_O_users_O_sara_CN_Ernst_Pijper_ops_Role_lcgadmin_Capability_NULL_ops001/79/CREAM798436615/CREAM798436615_jobWrapper.sh,stagein=cream_798436615.proxy at gb-ce-ams.els.sara.nl:/opt/glite/var/cream_sandbox/ops/_O_dutchgrid_O_users_O_sara_CN_Ernst_Pijper_ops_Role_lcgadmin_Capability_NULL_ops001/proxy/12881031212E589696sam2Dwms2Egrid2Esara2Enl13921030130163

When I remove one char or add at least 2 chars at the end, all goes well.

We have had a look and it seems that the source of this bug is the following in
qsub.c (2.5.3 version)

In the function smart_strtok (line 375 in qsub.c)
tmpLineSize = (line == NULL) ? strlen(*ptrPtr+ 1) : strlen(line) + 1;
It seems this needs to be:
tmpLineSize = (line == NULL) ? strlen(*ptrPtr)+ 1 : strlen(line) + 1;


We have seen this error in 2.4.10 and 2.5.3.
2.3.7 works fine, although we have not looked in the sources.

-- 
Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


More information about the torquedev mailing list