Bugzilla – Bug 99
qsub crashes with -W option and specific number of chars
Last modified: 2010-11-12 10:19:40 MST
You need to log in before you can comment on or make changes to this bug.
Hi, When I run qsub with the -W option, it crashes with the following error: *** glibc detected *** qsub: double free or corruption (!prev): 0x000000000b9cd950 *** ======= Backtrace: ========= /lib64/libc.so.6[0x347ee7230f] /lib64/libc.so.6(cfree+0x4b)[0x347ee7276b] qsub[0x402a12] qsub[0x402da4] qsub[0x404670] qsub[0x406a2c] qsub[0x40702a] qsub[0x4076ee] /lib64/libc.so.6(__libc_start_main+0xf4)[0x347ee1d994] qsub[0x402559] ======= Memory map: ======== 00400000-0040b000 r-xp 00000000 08:02 3228554 /usr/bin/qsub 0060a000-0060c000 rw-p 0000a000 08:02 3228554 /usr/bin/qsub 0060c000-0060e000 rw-p 0060c000 00:00 0 0b9cd000-0b9ee000 rw-p 0b9cd000 00:00 0 [heap] 347ea00000-347ea1c000 r-xp 00000000 08:02 6537318 /lib64/ld-2.5.so 347ec1b000-347ec1c000 r--p 0001b000 08:02 6537318 /lib64/ld-2.5.so 347ec1c000-347ec1d000 rw-p 0001c000 08:02 6537318 /lib64/ld-2.5.so 347ee00000-347ef4e000 r-xp 00000000 08:02 6537331 /lib64/libc-2.5.so 347ef4e000-347f14d000 ---p 0014e000 08:02 6537331 /lib64/libc-2.5.so 347f14d000-347f151000 r--p 0014d000 08:02 6537331 /lib64/libc-2.5.so 347f151000-347f152000 rw-p 00151000 08:02 6537331 /lib64/libc-2.5.so 347f152000-347f157000 rw-p 347f152000 00:00 0 3480600000-348060d000 r-xp 00000000 08:02 6537355 /lib64/libgcc_s-4.1.2-20080825.so.1 348060d000-348080d000 ---p 0000d000 08:02 6537355 /lib64/libgcc_s-4.1.2-20080825.so.1 348080d000-348080e000 rw-p 0000d000 08:02 6537355 /lib64/libgcc_s-4.1.2-20080825.so.1 2af4638d2000-2af4638d4000 rw-p 2af4638d2000 00:00 0 2af4638d4000-2af4638fe000 r-xp 00000000 08:02 3244368 /usr/lib/libtorque.so.2.0.0 2af4638fe000-2af463afe000 ---p 0002a000 08:02 3244368 /usr/lib/libtorque.so.2.0.0 2af463afe000-2af463b00000 rw-p 0002a000 08:02 3244368 /usr/lib/libtorque.so.2.0.0 2af463b00000-2af463be5000 rw-p 2af463b00000 00:00 0 2af463bfb000-2af463bfc000 rw-p 2af463bfb000 00:00 0 7fff42ff8000-7fff43021000 rw-p 7ffffffd5000 00:00 0 [stack] ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0 [vdso] Aborted This happens if and only if I use the following in the script (or as -W on the commandline) #PBS -W stagein=CREAM798436615_jobWrapper.sh@gb-ce-ams.els.sara.nl:/opt/glite/var/cream_sandbox/ops/_O_dutchgrid_O_users_O_sara_CN_Ernst_Pijper_ops_Role_lcgadmin_Capability_NULL_ops001/79/CREAM798436615/CREAM798436615_jobWrapper.sh,stagein=cream_798436615.proxy@gb-ce-ams.els.sara.nl:/opt/glite/var/cream_sandbox/ops/_O_dutchgrid_O_users_O_sara_CN_Ernst_Pijper_ops_Role_lcgadmin_Capability_NULL_ops001/proxy/12881031212E589696sam2Dwms2Egrid2Esara2Enl13921030130163 When I remove one char or add at least 2 chars at the end, all goes well. We have had a look and it seems that the source of this bug is the following in qsub.c (2.5.3 version) In the function smart_strtok (line 375 in qsub.c) tmpLineSize = (line == NULL) ? strlen(*ptrPtr+ 1) : strlen(line) + 1; It seems this needs to be: tmpLineSize = (line == NULL) ? strlen(*ptrPtr)+ 1 : strlen(line) + 1; We have seen this error in 2.4.10 and 2.5.3. 2.3.7 works fine, although we have not looked in the sources.
I was able to duplicate this bug and then verified the fix. It has been checked into 2.4-fixes, 2.5-fixes, 3.0 and trunk.