[torqueusers] resources_used. mem problems
brockp at umich.edu
Thu Oct 18 08:08:49 MDT 2012
Check that you enabled TM support in your OpenMPI build:
We are running OMPI 1.6 but here is what ompi_info shows us:
ompi_info | grep tm
MCA ras: tm (MCA v2.0, API v2.0, Component v1.6)
MCA plm: tm (MCA v2.0, API v2.0, Component v1.6)
MCA ess: tm (MCA v2.0, API v2.0, Component v1.6)
Thus with TM enabled mpirun for openMPI will use the sister moms to start the ranks on the other nodes. You can see this with pstree,
If you look at your mpich2 jobs if the sister moms don't show processes but you see rather
sshd -- hydra_proxy -- mympiprocess
Your mpiexec for mpich2 is not using TM to start the jobs.
The simplest route for this is to use mpiexec from osc and not use the mpiexec that comes with mpich2:
Though I think the hydra luancher in mpich2 added tm bootstrap support see:
I think you might want to jump on the mpich2 list and ask about PBS TM support.
Also note if you go to torque 4 currently the tm+mpiexec (osc) stuff is all broken, stick with 2.5 for a few more months.
CAEN Advanced Computing
brockp at umich.edu
On Oct 18, 2012, at 9:58 AM, Sreedhar Manchu wrote:
> We have Torque 2.5.12 on one of our new cluster. OS is Red Hat Enterprise Linux Server release 6.2 (Santiago). We installed OpenMPI version 1.4.5 (compiled with intel compilers).
> Strangely, with our parallel jobs that are using OpenMPI 1.4.5 are reporting resources_used. men as a sum of the memory being used on all the nodes in the job in stead of reporting the memory that's being used just on mother superior node (rank=0). But if we run the same job with MVAPICH2 then we are seeing the values only from the node with rank=0 for resources_used.mem. Where as on our old clusters, with version1.4.3 and Torque 2.5.11 we are seeing the values just from mother superior node (rank=0).
> Overall, this is very problematic because we ask Moab/Torque to kill the jobs that use the memory more than they requested or are allocated. We use qsub wrapper to define memory for each and every job just to avoid node crashing, etc, etc. Since it is reporting all the memory that's being used on all the nodes (let's say 100 nodes), the sum is huge and it's way bigger than the memory on each individual node and so job is getting killed saying that it has exceeded the memory allocated.
> Has anyone seen this behavior on your clusters? Given that it is working fine with MVAPICH2 I'm thinking it has to do with OpenMPI 1.4.5 (as it works fine with 1.4.3). We are testing 1.4.3 on our new clusters and plan to test 1.4.5 on our old clusters. But I thought it'd be useful to know whether anyone has any thoughts on it. Please let me know.
> torqueusers mailing list
> torqueusers at supercluster.org
More information about the torqueusers