Bugzilla – Bug 175
buffer overruns in cpuset.c
Last modified: 2012-05-11 09:01:44 MDT
You need to log in before you can comment on or make changes to this bug.
I am trying to get torque 3.0.4 working in a UV1000 system with 2048 cores. There are buffer overruns in cpuset.c, add_cpus_to_jobset: Both cpusbuf and memsbuf are allocated with a fixed length of MAXPATHLEN+1, however, cpusbuf holds the list (comma separated) of cpus (cores), i.e., its length must be (# of digits in Ncores + 1) * Ncores, where Ncores is the number of cores in the system, e.g., for Ncores = 2048 the required length is 10240; MAXPATHLEN is 1024. The solution is something like: char *cpusbuf; int len_Ncores = 2, cnt; for (cnt = Ncores; cnt > 9; cnt /= 10) len_Ncores++; cpusbuf = (char *)malloc(len_Ncores*Ncores*sizeof(char)); Similarly for memsbuf. My problem right now is how do I get Ncores (and similarly Nmems) in add_cpus_to_jobset? For now I can hardcode the length of 10240, but that is going to break as soon as somebody tries to run on a machine with more than 2048 cores. - Martin
I would use the dynamic_string struct instead of trying to calculate these numbers from beforehand. It is fairly easy to use, just check out dynamic_string.h.
Created an attachment (id=104) [details] fix buffer overruns in cpuset.c
The problem is more complicated: apparently there exists a 4095 byte limit for the kernel VFS. uv1000:/dev/cpuset/torque/martin # seq -s, 1 1041 | wc 1 1 4098 uv1000:/dev/cpuset/torque/martin # seq -s, 1 1040 > cpus ; cat cpus 1-1040 uv1000:/dev/cpuset/torque/martin # seq -s, 1 1041 > cpus ; cat cpus 1 uv1000:/dev/cpuset/torque/martin # echo "1-1041" > cpus ; cat cpus 1-1041 Thus, it is not possible to simply increase the string size of cpusbuf, at least not beyond 4095 which is good for a maximum of only 1040 cpus. The solution is indicated above: it is possible to write ranges to the cpus file, instead of writing a comma separated list of cpus. The attached patch implements such a solution: first sort the list of cpus, then construct a cpusbuf string that collapse the list of cpus into ranges as much as possible.