[torqueusers] resources_used. mem problems
tbaer at utk.edu
Thu Oct 18 08:09:01 MDT 2012
On Thu, 2012-10-18 at 09:58 -0400, Sreedhar Manchu wrote:
> Has anyone seen this behavior on your clusters? Given that it is
> working fine with MVAPICH2 I'm thinking it has to do with OpenMPI
> 1.4.5 (as it works fine with 1.4.3). We are testing 1.4.3 on our new
> clusters and plan to test 1.4.5 on our old clusters. But I thought
> it'd be useful to know whether anyone has any thoughts on it. Please
> let me know.
It sounds to me that OpenMPI is doing the right thing here, in that it's
launching processes through the TORQUE TM API so that its resource usage
is being accounting accurately. OTOH, I'm guessing that your MVAPICH2
install is using either rsh or ssh to start remote processes, which does
*NOT* handle resource usage accounting (or signal delivery) correctly.
I would recommend getting your MVAPICH2 install to use the TM API to
launch processes, either using the mpiexec.hydra script that likely
comes with MVAPICH2 or using OSC mpiexec .
Troy Baer, Senior HPC System Administrator
National Institute for Computational Sciences, University of Tennessee
More information about the torqueusers