[torqueusers] 4.1.5.1 memory leak

Steven Lo slo at cacr.caltech.edu
Fri Dec 6 10:52:56 MST 2013


Hi David,

The nodes which we observed are running the following version:

-bash-3.2# ldd /opt/torque/sbin/pbs_mom | grep libc.so
     libc.so.6 => /lib64/libc.so.6 (0x00002b18eae2a000)

-bash-3.2# ldd --version
ldd (GNU libc) 2.5


-bash-3.2# qstat --version
Version: 4.1.5.1
Revision:

-bash-3.2# uname -a
Linux zwicky005 2.6.18-308.1.1.el5 #1 SMP Fri Feb 17 16:51:01 EST 2012 
x86_64 x86_64 x86_64 GNU/Linux



We see that it's using ~3G of memory:

-bash-3.2# top -p 16695

top - 09:46:45 up 81 days,  1:01,  1 user,  load average: 9.19, 9.17, 9.11
Tasks:   1 total,   0 running,   1 sleeping,   0 stopped,   0 zombie
Cpu(s): 74.6%us,  0.7%sy,  0.0%ni, 24.6%id,  0.0%wa,  0.0%hi, 0.0%si,  
0.0%st
Mem:  24675856k total, 24286304k used,   389552k free,   497860k buffers
Swap: 49150856k total,  4750564k used, 44400292k free, 10798448k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+ COMMAND
16695 root      15   0 3195m 3.1g 7052 S  0.3 13.1  77:50.71 pbs_mom


We came across this posting and not sure if this is relevant:

http://comments.gmane.org/gmane.comp.clustering.torque.user/13557


Thanks for looking into this.

Steven.


On 12/06/2013 09:04 AM, David Beer wrote:
> The issue is that in some versions of libc, the pthread stack size 
> will default to 1000 * <the value set in ulimit -s>, even though 
> TORQUE specifies what stack size each thread should have. I will work 
> to get a list of the versions of libc that have this bug. Ken is the 
> one that discovered this defect, so I'll ask him for the info or ask 
> him to post the info.
>
>
> On Fri, Dec 6, 2013 at 9:02 AM, Gus Correa <gus at ldeo.columbia.edu 
> <mailto:gus at ldeo.columbia.edu>> wrote:
>
>     David
>
>     For the benefit of all Torque users,
>     could you please disclose all combinations of libc versions
>     and Torque versions that have this problem?
>
>     Thank you,
>     Gus Correa
>
>     On 12/05/2013 08:52 PM, David Beer wrote:
>     > Steven,
>     >
>     > What OS and version of the pthread library (libc) do you have?
>     We know
>     > of a rather large memory leak related to different versions
>     these libraries.
>     >
>     >
>     > On Thu, Dec 5, 2013 at 12:01 PM, Steven Lo <slo at cacr.caltech.edu
>     <mailto:slo at cacr.caltech.edu>
>     > <mailto:slo at cacr.caltech.edu <mailto:slo at cacr.caltech.edu>>> wrote:
>     >
>     >
>     >     Hi,
>     >
>     >     We've discovered that pbs_mom on most nodes are using over
>     3GB of
>     >     memory.
>     >     Is there a known memory leak issue for version 4.1.5.1?  If
>     so, is there
>     >     a patch for
>     >     it or we have to upgrade to other version like 4.1.7 or 4.2.6.1?
>     >
>     >     Thanks in advance for your suggestion.
>     >
>     >     Steven.
>     >
>     >     _______________________________________________
>     >     torqueusers mailing list
>     > torqueusers at supercluster.org
>     <mailto:torqueusers at supercluster.org>
>     <mailto:torqueusers at supercluster.org
>     <mailto:torqueusers at supercluster.org>>
>     > http://www.supercluster.org/mailman/listinfo/torqueusers
>     >
>     >
>     >
>     >
>     > --
>     > David Beer | Senior Software Engineer
>     > Adaptive Computing
>     >
>     >
>     > _______________________________________________
>     > torqueusers mailing list
>     > torqueusers at supercluster.org <mailto:torqueusers at supercluster.org>
>     > http://www.supercluster.org/mailman/listinfo/torqueusers
>
>     _______________________________________________
>     torqueusers mailing list
>     torqueusers at supercluster.org <mailto:torqueusers at supercluster.org>
>     http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
>
>
> -- 
> David Beer | Senior Software Engineer
> Adaptive Computing
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20131206/3916cf13/attachment.html 


More information about the torqueusers mailing list