[torqueusers] resources problem
bernd-schubert at gmx.de
Wed Nov 29 07:25:34 MST 2006
we just had the problem that the job one of our group members required more
resources (memory) than requested, but still torque didn't kill it. Also,
qstat reports by far too low resources for this special program. For all
other programs presently running the resources reported are fine, only this
program is troublesome.
While looking whats so special about it, we see its basically a mpi program
compiled with mpicc, however, it is NOT started using mpirun, but on our
cluster its just queued as any other program using 'qstat program_name'.
Already some time ago Garrick sent me his "dumpmom" program to analyze
reported resources. Now using dupmom, I clearly see that qstat reports the
resouces used by the starting bash, but doesn't count the resources used by
the program started from this bash. However, dumpmon also additionally
reports those data for the program/pid started from this bash.
Here I'm lost now, I have now idea what I could do or how to debug it. Is this
a bug of mom_priv running on the nodes or is it a bug of pbs_server?
Thanks in advance,
PCI / Theoretische Chemie
More information about the torqueusers