Bug 175 - buffer overruns in cpuset.c
: buffer overruns in cpuset.c
Status: NEW
Product: TORQUE
pbs_mom
: 3.0.x
: PC Linux
: P5 blocker
Assigned To: Ken Nielson
:
:
:
  Show dependency treegraph
 
Reported: 2012-04-19 13:36 MDT by Martin Siegert
Modified: 2012-05-11 09:01 MDT (History)
3 users (show)

See Also:


Attachments
fix buffer overruns in cpuset.c (3.76 KB, patch)
2012-04-24 18:10 MDT, Martin Siegert
Details | Diff


Note

You need to log in before you can comment on or make changes to this bug.


Description Martin Siegert 2012-04-19 13:36:19 MDT
I am trying to get torque 3.0.4 working in a UV1000 system with 2048 cores.
There are buffer overruns in cpuset.c, add_cpus_to_jobset:
Both cpusbuf and memsbuf are allocated with a fixed length of MAXPATHLEN+1,
however, cpusbuf holds the list (comma separated) of cpus (cores), i.e., its
length must be (# of digits in Ncores + 1) * Ncores, where Ncores is the
number of cores in the system, e.g., for Ncores = 2048 the required length
is 10240; MAXPATHLEN is 1024.
The solution is something like:

char  *cpusbuf;
int len_Ncores = 2, cnt;
for (cnt = Ncores; cnt > 9; cnt /= 10) len_Ncores++;
cpusbuf = (char *)malloc(len_Ncores*Ncores*sizeof(char));

Similarly for memsbuf.
My problem right now is how do I get Ncores (and similarly Nmems) in
add_cpus_to_jobset?
For now I can hardcode the length of 10240, but that is going to break as
soon as somebody tries to run on a machine with more than 2048 cores.

- Martin
Comment 1 David Beer 2012-04-19 16:19:47 MDT
I would use the dynamic_string struct instead of trying to calculate these
numbers from beforehand. It is fairly easy to use, just check out
dynamic_string.h.
Comment 2 Martin Siegert 2012-04-24 18:10:08 MDT
Created an attachment (id=104) [details]
fix buffer overruns in cpuset.c
Comment 3 Martin Siegert 2012-04-24 18:11:05 MDT
The problem is more complicated: apparently there exists a 4095 byte limit 
for the kernel VFS.

uv1000:/dev/cpuset/torque/martin # seq -s, 1 1041 | wc
      1       1    4098
uv1000:/dev/cpuset/torque/martin # seq -s, 1 1040 > cpus ; cat cpus
1-1040
uv1000:/dev/cpuset/torque/martin # seq -s, 1 1041 > cpus ; cat cpus
1
uv1000:/dev/cpuset/torque/martin # echo "1-1041" > cpus ; cat cpus
1-1041

Thus, it is not possible to simply increase the string size of cpusbuf,
at least not beyond 4095 which is good for a maximum of only 1040 cpus.
The solution is indicated above: it is possible to write ranges to the
cpus file, instead of writing a comma separated list of cpus.

The attached patch implements such a solution: first sort the list of
cpus, then construct a cpusbuf string that collapse the list of cpus
into ranges as much as possible.