[Mauiusers] Problem with running multicore jobs on multi nodes

Piotr Brona pbrona at gmail.com
Fri Nov 26 17:33:52 MST 2010


Hi,

My problem concern maui-3.3, torque-2.5.3 and openmpi-1.4 comunication. Exactly,
I have problem with running multicore jobs on multi nodes. I read all topics
which are connection with my problem and I couldn't find solution. I think it's
a problem with maui-3.3 scheduler because if I disable it and use pbs scheduler
everything is fine. I read a lot about ENABLEMULTIREQJOBS and JOBNODEMATCHPOLICY
and I know that these variables are necessary to run MPI jobs on clustrer. I set
these variables in maui config file, but when I run checkconfig command these
variables are not set. Below are my system settings, maui config file and output
from showconfig command.

---------- System Settings ----------

[root at ori1 ~]# uname -a
Linux ori1 2.6.18-194.17.4.el5xen #1 SMP Tue Oct 26 12:37:47 CEST 2010 x86_64
x86_64 x86_64 GNU/Linux

[root at ori1 ~]# maui --version
Maui version 3.3
Copyright 2000-2010 Cluster Resources, Inc, All Rights Reserved
  for the latest release, see http://clusterresources.com/maui
This software includes the Maui Server Module, Copyright 1996 MHPCC, All Rights
Reserved
This software utilizes the Moab Scheduling Library, version 3.3
Copyright 2000-2010 Cluster Resources, Inc, All Rights Reserved

[root at ori1 ~]# pbs_server --version
version: 2.5.3

[root at ori1 ~]# /usr/lib64/openmpi/1.4-gcc/bin/mpiexec --version
mpiexec (OpenRTE) 1.4

Report bugs to http://www.open-mpi.org/community/help/

[root at ori1 x86_64]# /usr/lib64/openmpi/1.4-gcc/bin/ompi_info | grep tm
	MCA memory: ptmalloc2 (MCA v2.0, API v2.0, Component v1.4)
		MCA ras: tm (MCA v2.0, API v2.0, Component v1.4)
		MCA plm: tm (MCA v2.0, API v2.0, Component v1.4)

-------------------------------------

---------- maui.cfg ----------

# maui.cfg 3.3

SERVERHOST		ori1
ADMIN1			root

RMCFG[ori1]		TYPE=PBS

RMPOLLINTERVAL		00:00:10

SERVERPORT		40559
SERVERMODE		NORMAL

LOGFILE			/var/spool/maui/logs/maui.log
LOGFILEMAXSIZE		100000000
LOGLEVEL		7

QUEUETIMEWEIGHT		1 

BACKFILLPOLICY		FIRSTFIT
RESERVATIONPOLICY	CURRENTHIGHEST

NODEALLOCATIONPOLICY	MINRESOURCE

ENABLEMULTIREQJOBS	TRUE
ENABLEMULTINODEJOBS	TRUE

JOBNODEMATCHPOLICY	EXACTNODE

NODEACCESSPOLICY	SHARED

----------------------------------

---------- showconfig ----------

[root at ori1 ~]# showconfig
NODELOADPOLICY			ADJUSTSTATE
JOBNODEMATCHPOLICY[1]

JOBMAXSTARTTIME[1]		INFINITY

METAMAXTASKS[1]			0
NODESETPOLICY[1]		[NONE]
NODESETATTRIBUTE[1]		[NONE]
NODESETLIST[1]
NODESETDELAY[1]			00:00:00
NODESETPRIORITYTYPE[1]		MINLOSS
NODESETTOLERANCE[1]		0.00

# Priority Weights

XFMINWCLIMIT[1]			00:00:00

RMAUTHTYPE[0]			CHECKSUM

CLASSCFG[simple]		DEFAULT.FEATURES=[NONE]
QOSPRIORITY[0]			0
QOSQTWEIGHT[0]			0
QOSXFWEIGHT[0]			0
QOSTARGETXF[0]			0.00
QOSTARGETQT[0]			00:00:00
QOSFLAGS[0]
QOSPRIORITY[1]			0
QOSQTWEIGHT[1]			0
QOSXFWEIGHT[1]			0
QOSTARGETXF[1]			0.00
QOSTARGETQT[1]			00:00:00
QOSFLAGS[1]
RESDEPTH			24

SCHEDCFG[]			MODE=NORMAL SERVER=ori1:40559 
# RM MODULES: PBS SSS WIKI NATIVE 
TYPE=PBS
SIMEXITITERATION		-1

--------------------------------------

Please help me resolve my problem.
Thanks in advance.

Best Regards
Piotr Brona



More information about the mauiusers mailing list