[torqueusers] 4GB resources_used.mem limit

Garrick Staples garrick at usc.edu
Thu Jun 30 14:44:19 MDT 2005


On Wed, Jun 29, 2005 at 11:13:26AM +0200, Bernd Schubert alleged:
> Hello,
> 
> we have a cluster running a combination of torque + maui.  In principle its
> running fine, we only have one pretty annoying problem, torque does not
> detect jobs running more than 4GB, qstat always only shows
> 'actual_size - 4GB' for jobs with more than 4GB.

I'm not able to test this.  But the first thing you need to do is figure out if
pbs_mom is reporting the wrong info, or if pbs_server is breaking it.

You can query this info directly from pbs_mom using momctl or a small util I
wrote awhile ago called dumpmom (http://www-rcf.usc.edu/~garrick/dumpmom.c)

To use momctl, first get the session list, then get the memory usage of that
session.  Here's an example with a node having 2 sessions, and 1 of them is
using 100MB.

   $ momctl -q sessions -h hpc0961
     hpc0961:     sessions = 'sessions=30631 30651'
   $ momctl -q 'mem[session=30631]' -h hpc0961
     hpc0961: mem[session=30631] = 'mem[session=30631]=120856kb'

dumpmom is easier for this particular purpose, just do 'dumpmom hpc0961' and
it will print out lots of similar information.

If you can verify that pbs_mom is sending the correct info, then we can look
into pbs_server.


-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20050630/8241f5a9/attachment.bin


More information about the torqueusers mailing list