[torqueusers] 4GB resources_used.mem limit

Bernd Schubert bernd-schubert at gmx.de
Wed Jun 29 03:13:26 MDT 2005


we have a cluster running a combination of torque + maui.  In principle its
running fine, we only have one pretty annoying problem, torque does not
detect jobs running more than 4GB, qstat always only shows
'actual_size - 4GB' for jobs with more than 4GB.
If it only would be a problem of qstat, we wouldn't care. Unfortunately it
also prevents torque to kill improperly specified jobs. So it can happen and
already happend several times, that one job required all memory on a node,
but torque happily started another job on this node, just because at least
one user didn't properly specify how much memory his/her jobs required and
since torque didn't kill those jobs automatically.

We hoped this issue would be solved after the installation of the 64-bit (its
a 32/64bit biarch debian system) version of torque, but this didn't help.
Anyone here having an idea whats going on, how to debug or even how to solve
I'm pretty unfamiliar with torque+maui (we don't maintain the basic stuff
ourselves) and also havn't looked into the source code. From thinking in the
C language, I can only imagine that someone has directly specified a 32bit
integer for the memory variable, but who would do this?

The torque version is 1.2.0p3 and maui is 3.2.6p11-2.

Thanks in a advance,

Bernd Schubert
PCI / Theoretische Chemie
Universität Heidelberg
INF 229
69120 Heidelberg
e-mail: bernd.schubert at pci.uni-heidelberg.de

More information about the torqueusers mailing list