[Mauiusers] Problems with Maui + SLURM

Josh Butikofer josh at clusterresources.com
Tue Nov 28 14:26:07 MST 2006


Could you run Maui under gdb or other debugger to see where Maui is terminating when the job
completes? Set the environment variable MOABDEBUG=YES and then run Maui like so "gdb maui" and
then enter "run" to get the daemon running. If Maui crashes/completes a gdb prompt will
again appear. Use "where" to give a stack trace and e-mail it to us. This will help
us track down the crash.

-- 
Joshua Butikofer
Cluster Resources, Inc.

josh at clusterresources.com
Voice: (801) 717-3707
Fax:   (801) 717-3738
--------------------------


vesor at 163.com wrote:
> I configured two slurm nodes, "node10" with 4 processors and "node7" with 2 
> processors. When i submit a job, the job completes but maui also terminates.
> So i have to start maui again if i want to submit another job. :(
> Futhermore, when i use "srun -N2 hostname", it seems only one proc has been
> allocated to the job and the job can't run.
> 
> slurm version is 1.1 and maui version is 3.2.6p18
> 
> ############################
> [root at node10 root]# srun -n6 hostname
> srun: Warning: Requested partition configuration not available now
> srun: job 6 queued and waiting for resources
> srun: job 6 has been allocated resources
> node7
> node10
> node7
> node10
> node7
> node7
> [root at node10 root]# tail -100 /usr/local/maui/log/maui.log
> 11/28 10:05:51 MResAdjust(NULL,0,0)
> 11/28 10:05:51 MStatInitializeActiveSysUsage()
> 11/28 10:05:51 MStatClearUsage([NONE],Active)
> 11/28 10:05:51 ServerUpdate()
> 11/28 10:05:51 MSysUpdateTime()
> 11/28 10:05:51 INFO:     starting iteration 1
> 11/28 10:05:51 MRMGetInfo()
> 11/28 10:05:51 MClusterClearUsage()
> 11/28 10:05:51 MRMClusterQuery()
> 11/28 10:05:51 MWikiClusterLoadInfo(node10,RCount,EMsg,SC)
> 11/28 10:05:51 MWikiDoCommand(node10,7321,9000000,NONE,CMD=GETNODES ARG=0:ALL,Data,DataSize,SC)
> 11/28 10:05:51 MSUSendData(S,9000000,FALSE,FALSE)
> 11/28 10:05:51 INFO:     packet sent (31 bytes of 31)
> 11/28 10:05:51 INFO:     command sent to server
> 11/28 10:05:51 INFO:     message sent: 'CMD=GETNODES ARG=0:ALL'
> 11/28 10:05:51 MSURecvData(S,9000000,FALSE,SC,EMsg)
> 11/28 10:05:51 MSURecvPacket(7,BufP,9,NULL,9000000,SC)
> 11/28 10:05:51 MSURecvPacket(7,BufP,161,NULL,9000000,SC)
> 11/28 10:05:51 MSUDisconnect(S)
> 11/28 10:05:51 INFO:     received node list through WIKI RM
> 11/28 10:05:51 INFO:     loading 2 node(s)
> 11/28 10:05:51 MWikiNodeUpdate(AList,node10)
> 11/28 10:05:51 MWikiNodeUpdate(AList,node7)
> 11/28 10:05:51 INFO:     2 WIKI resources detected on RM node10
> 11/28 10:05:51 INFO:     resources detected: 2
> 11/28 10:05:51 MRMWorkloadQuery()
> 11/28 10:05:51 MWikiWorkloadQuery(node10,JCount,SC)
> 11/28 10:05:51 MWikiDoCommand(node10,7321,9000000,NONE,CMD=GETJOBS ARG=0:ALL,Data,DataSize,SC)
> 11/28 10:05:51 MSUSendData(S,9000000,FALSE,FALSE)
> 11/28 10:05:51 INFO:     packet sent (30 bytes of 30)
> 11/28 10:05:51 INFO:     command sent to server
> 11/28 10:05:51 INFO:     message sent: 'CMD=GETJOBS ARG=0:ALL'
> 11/28 10:05:51 MSURecvData(S,9000000,FALSE,SC,EMsg)
> 11/28 10:05:51 MSURecvPacket(7,BufP,9,NULL,9000000,SC)
> 11/28 10:05:51 MSURecvPacket(7,BufP,351,NULL,9000000,SC)
> 11/28 10:05:51 MSUDisconnect(S)
> 11/28 10:05:51 INFO:     received job list through WIKI RM
> 11/28 10:05:51 INFO:     loading 2 job(s)
> 11/28 10:05:51 MWikiJobLoad(6,UPDATETIME=1164679547;STATE=Idle;WCLIMIT=0;TASKS=1;QUEUETIME=1164679547;UNAME=root;GNAME=root;PARTITIONMASK=test;NODES=1;RMEM=1;RDISK=1;,J,TaskList,node10)
> 11/28 10:05:51 MReqCreate(6,SrcRQ,DstRQ,DoCreate)
> 11/28 10:05:51 MUGetIndex(UPDATETIME=1164679547,ValList,0)
> 11/28 10:05:51 MUGetIndex(STATE=Idle,ValList,0)
> 11/28 10:05:51 MUGetIndex(WCLIMIT=0,ValList,0)
> 11/28 10:05:51 MUGetIndex(TASKS=1,ValList,0)
> 11/28 10:05:51 MUGetIndex(QUEUETIME=1164679547,ValList,0)
> 11/28 10:05:51 MUGetIndex(UNAME=root,ValList,0)
> 11/28 10:05:51 MUGetIndex(GNAME=root,ValList,0)
> 11/28 10:05:51 MUGetIndex(PARTITIONMASK=test,ValList,0)
> 11/28 10:05:51 MUGetIndex(NODES=1,ValList,0)
> 11/28 10:05:51 MUGetIndex(RMEM=1,ValList,0)
> 11/28 10:05:51 MUGetIndex(RDISK=1,ValList,0)
> 11/28 10:05:51 MJobSetCreds(6,root,root,)
> 11/28 10:05:51 INFO:     default QOS for job 6 set to DEFAULT(0) (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
> 11/28 10:05:51 INFO:     default QOS for job 6 set to DEFAULT(0) (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
> 11/28 10:05:51 INFO:     job '6' loaded:   1     root     root      0       Idle   0 1164679547   [NONE] [NONE] [NONE] >=      1 >=      1 [NONE] 1164679547
> 11/28 10:05:51 INFO:     2 WIKI jobs detected on RM node10
> 11/28 10:05:51 INFO:     jobs detected: 2
> 11/28 10:05:51 MStatClearUsage(node,Active)
> 11/28 10:05:51 MClusterUpdateNodeState()
> 11/28 10:05:51 MQueueSelectAllJobs(Q,HARD,ALL,JIList,DP,Msg)
> 11/28 10:05:51 ERROR:    job '6' has NULL WCLimit field
> 11/28 10:05:51 INFO:     job '6' Priority:        1
> 11/28 10:05:51 INFO:     Cred:      0(00.0)  FS:      0(00.0)  Attr:      0(00.0)  Serv:      0(00.0)  Targ:      0(00.0)  Res:      0(00.0)  Us:      0(00.0)
> 11/28 10:05:51 MStatClearUsage([NONE],Active)
> 11/28 10:05:51 INFO:     total jobs selected (ALL): 1/1 
> 11/28 10:05:51 MQueueSelectAllJobs(Q,SOFT,ALL,JIList,DP,Msg)
> 11/28 10:05:51 ERROR:    job '6' has NULL WCLimit field
> 11/28 10:05:51 INFO:     job '6' Priority:        1
> 11/28 10:05:51 INFO:     Cred:      0(00.0)  FS:      0(00.0)  Attr:      0(00.0)  Serv:      0(00.0)  Targ:      0(00.0)  Res:      0(00.0)  Us:      0(00.0)
> 11/28 10:05:51 MStatClearUsage([NONE],Idle)
> 11/28 10:05:51 INFO:     total jobs selected (ALL): 1/1 
> 11/28 10:05:51 MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,EVERY,FReason,FALSE)
> 11/28 10:05:51 INFO:     total jobs selected in partition ALL: 1/1 
> 11/28 10:05:51 MQueueScheduleRJobs(Q)
> 11/28 10:05:51 MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE)
> 11/28 10:05:51 INFO:     total jobs selected in partition ALL: 1/1 
> 11/28 10:05:51 MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,ALL,FReason,TRUE)
> 11/28 10:05:51 INFO:     job 6 not considered for spanning
> 11/28 10:05:51 INFO:     total jobs selected in partition ALL: 0/1 [PartitionAccess: 1]
> 11/28 10:05:51 MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,test,FReason,TRUE)
> 11/28 10:05:51 INFO:     total jobs selected in partition test: 1/1 
> 11/28 10:05:51 MQueueScheduleIJobs(Q,test)
> 11/28 10:05:51 INFO:     6 feasible tasks found for job 6:0 in partition test (1 Needed)
> 11/28 10:05:51 INFO:     tasks located for job 6:  1 of 1 required (6 feasible)
> 11/28 10:05:51 MJobStart(6)
> 11/28 10:05:51 MJobDistributeTasks(6,node10,NodeList,TaskMap)
> 11/28 10:05:51 MAMAllocJReserve(6,RIndex,ErrMsg)
> 11/28 10:05:51 MRMJobStart(6,Msg,SC)
> 11/28 10:05:51 MWikiJobStart(6,node10,Msg,SC)
> 11/28 10:05:51 MWikiDoCommand(node10,7321,9000000,NONE,CMD=STARTJOB ARG=6 TASKLIST=node7,Data,DataSize,SC)
> 11/28 10:05:51 MSUSendData(S,9000000,FALSE,FALSE)
> 11/28 10:05:51 INFO:     packet sent (42 bytes of 42)
> 11/28 10:05:51 INFO:     command sent to server
> 11/28 10:05:51 INFO:     message sent: 'CMD=STARTJOB ARG=6 TASKLIST=node7'
> 11/28 10:05:51 MSURecvData(S,9000000,FALSE,SC,EMsg)
> 11/28 10:05:51 MSURecvPacket(7,BufP,9,NULL,9000000,SC)
> 11/28 10:05:51 MSURecvPacket(7,BufP,77,NULL,9000000,SC)
> 11/28 10:05:51 MSUDisconnect(S)
> 11/28 10:05:51 INFO:     job '6' started through WIKI RM on 1 procs
> 11/28 10:05:51 MStatUpdateActiveJobUsage(6)
> ##############################################
> [root at node10 root]# srun -N2 hostname
> srun: Warning: Requested partition configuration not available now
> srun: job 7 queued and waiting for resources
> 
> [root at node10 root]# showq
> ACTIVE JOBS--------------------
> JOBNAME            USERNAME      STATE  PROC   REMAINING            STARTTIME
> 
> 
>      0 Active Jobs       0 of    6 Processors Active (0.00%)
>                          0 of    2 Nodes Active      (0.00%)
> 
> IDLE JOBS----------------------
> JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME
> 
> 7                      root       Idle     1 99:23:59:59  Tue Nov 28 10:09:17
> 
> 1 Idle Job 
> 
> BLOCKED JOBS----------------
> JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME
> 
> 
> Total Jobs: 1   Active Jobs: 0   Idle Jobs: 1   Blocked Jobs: 0
> [root at node10 root]# tail -100 /usr/local/maui/log/maui.log
> 11/28 10:09:43 INFO:     message sent: 'CMD=GETJOBS ARG=0:ALL'
> 11/28 10:09:43 MSURecvData(S,9000000,FALSE,SC,EMsg)
> 11/28 10:09:43 MSURecvPacket(7,BufP,9,NULL,9000000,SC)
> 11/28 10:09:43 MSURecvPacket(7,BufP,395,NULL,9000000,SC)
> 11/28 10:09:43 MSUDisconnect(S)
> 11/28 10:09:43 INFO:     received job list through WIKI RM
> 11/28 10:09:43 INFO:     loading 2 job(s)
> 11/28 10:09:43 MWikiUpdateJob(AList,7,0)
> 11/28 10:09:43 MUGetIndex(UPDATETIME=1164679757,ValList,0)
> 11/28 10:09:43 MUGetIndex(STATE=Idle,ValList,0)
> 11/28 10:09:43 MUGetIndex(WCLIMIT=0,ValList,0)
> 11/28 10:09:43 MUGetIndex(TASKS=1,ValList,0)
> 11/28 10:09:43 MUGetIndex(QUEUETIME=1164679757,ValList,0)
> 11/28 10:09:43 MUGetIndex(UNAME=root,ValList,0)
> 11/28 10:09:43 MUGetIndex(GNAME=root,ValList,0)
> 11/28 10:09:43 MUGetIndex(PARTITIONMASK=test,ValList,0)
> 11/28 10:09:43 MUGetIndex(NODES=2,ValList,0)
> 11/28 10:09:43 MUGetIndex(RMEM=1,ValList,0)
> 11/28 10:09:43 MUGetIndex(RDISK=1,ValList,0)
> 11/28 10:09:43 INFO:     2 WIKI jobs detected on RM node10
> 11/28 10:09:43 INFO:     jobs detected: 2
> 11/28 10:09:43 MStatClearUsage(node,Active)
> 11/28 10:09:43 MClusterUpdateNodeState()
> 11/28 10:09:43 MQueueSelectAllJobs(Q,HARD,ALL,JIList,DP,Msg)
> 11/28 10:09:43 INFO:     job '7' Priority:        1
> 11/28 10:09:43 INFO:     Cred:      0(00.0)  FS:      0(00.0)  Attr:      0(00.0)  Serv:      0(00.0)  Targ:      0(00.0)  Res:      0(00.0)  Us:      0(00.0)
> 11/28 10:09:43 MStatClearUsage([NONE],Active)
> 11/28 10:09:43 INFO:     total jobs selected (ALL): 1/1 
> 11/28 10:09:43 MQueueSelectAllJobs(Q,SOFT,ALL,JIList,DP,Msg)
> 11/28 10:09:43 INFO:     job '7' Priority:        1
> 11/28 10:09:43 INFO:     Cred:      0(00.0)  FS:      0(00.0)  Attr:      0(00.0)  Serv:      0(00.0)  Targ:      0(00.0)  Res:      0(00.0)  Us:      0(00.0)
> 11/28 10:09:43 MStatClearUsage([NONE],Idle)
> 11/28 10:09:43 INFO:     total jobs selected (ALL): 1/1 
> 11/28 10:09:43 MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,EVERY,FReason,FALSE)
> 11/28 10:09:43 INFO:     total jobs selected in partition ALL: 1/1 
> 11/28 10:09:43 MQueueScheduleRJobs(Q)
> 11/28 10:09:43 MResDestroy(7)
> 11/28 10:09:43 MResChargeAllocation(7,2)
> 11/28 10:09:43 MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE)
> 11/28 10:09:43 INFO:     total jobs selected in partition ALL: 1/1 
> 11/28 10:09:43 MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,ALL,FReason,TRUE)
> 11/28 10:09:43 INFO:     job 7 not considered for spanning
> 11/28 10:09:43 INFO:     total jobs selected in partition ALL: 0/1 [PartitionAccess: 1]
> 11/28 10:09:43 MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,test,FReason,TRUE)
> 11/28 10:09:43 INFO:     total jobs selected in partition test: 1/1 
> 11/28 10:09:43 MQueueScheduleIJobs(Q,test)
> 11/28 10:09:43 INFO:     6 feasible tasks found for job 7:0 in partition test (1 Needed)
> 11/28 10:09:43 INFO:     tasks located for job 7:  2 of 1 required (6 feasible)
> 11/28 10:09:43 MJobStart(7)
> 11/28 10:09:43 MJobDistributeTasks(7,node10,NodeList,TaskMap)
> 11/28 10:09:43 ALERT:    inadequate tasks allocated to job
> 11/28 10:09:43 WARNING:  cannot distribute allocated tasks for job '7'
> 11/28 10:09:43 ERROR:    cannot start job '7' in partition test
> 11/28 10:09:43 MJobPReserve(7,test,ResCount,ResCountRej)
> 11/28 10:09:43 MJobReserve(7,Priority)
> 11/28 10:09:43 INFO:     6 feasible tasks found for job 7:0 in partition test (1 Needed)
> 11/28 10:09:43 INFO:     6 feasible tasks found for job 7:0 in partition test (1 Needed)
> 11/28 10:09:43 INFO:     located resources for 1 tasks (6) in best partition test for job 7 at time 00:00:01
> 11/28 10:09:43 INFO:     tasks located for job 7:  2 of 1 required (6 feasible)
> 11/28 10:09:43 MJobDistributeTasks(7,node10,NodeList,TaskMap)
> 11/28 10:09:43 ALERT:    inadequate tasks allocated to job
> 11/28 10:09:43 MResJCreate(7,MNodeList,00:00:01,Priority,Res)
> 11/28 10:09:43 INFO:     job '7' reserved 1 tasks (partition test) to start in 00:00:01 on Tue Nov 28 10:09:44
>  (WC: 8639999)
> Active Jobs------
> ------------------
> 11/28 10:09:43 INFO:     resources available after scheduling: N: 2  P: 6
> 11/28 10:09:43 MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,EVERY,FReason,TRUE)
> 11/28 10:09:43 INFO:     total jobs selected in partition ALL: 1/1 
> 11/28 10:09:43 MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,ALL,FReason,TRUE)
> 11/28 10:09:43 INFO:     job 7 not considered for spanning
> 11/28 10:09:43 INFO:     total jobs selected in partition ALL: 0/1 [PartitionAccess: 1]
> 11/28 10:09:43 MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,test,FReason,TRUE)
> 11/28 10:09:43 INFO:     total jobs selected in partition test: 1/1 
> 11/28 10:09:43 MQueueBackFill(BFQueue,HARD,test)
> 11/28 10:09:43 MBFGetWindow(BFNodeCount,BFTaskCount,BFNodeList,BFTime,0,test,[ALL],[ALL],[ALL],'NC 0',1,DRes,NULL,NULL,NULL,NULL)
> 11/28 10:09:43 MJobSetCreds(BFWindow,[ALL],[ALL],[ALL])
> 11/28 10:09:43 INFO:     default QOS for job BFWindow set to DEFAULT(0) (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
> 11/28 10:09:43 INFO:     backfill window:  time:   INFINITY  nodes:   2  tasks:   6  mintime:     0 (idle nodes: 2)
> 11/28 10:09:43 INFO:     backfill window obtained [2 nodes/6 procs :   INFINITY]
> 11/28 10:09:43 MQueueSelectJobs(SrcQ,DstQ,HARD,2,6,975320217,test,FReason,FALSE)
> 11/28 10:09:43 INFO:     total jobs selected in partition test: 1/1 
> 11/28 10:09:43 MBFFirstFit(BFQueue,4,BFNodeList,975320217,2,6,test)
> 11/28 10:09:43 INFO:     partition test nodes/procs available after MBFFirstFit: 2/6 (0 jobs examined)
> 11/28 10:09:43 MBFGetWindow(BFNodeCount,BFTaskCount,BFNodeList,BFTime,975320217,test,[ALL],[ALL],[ALL],'NC 0',1,DRes,NULL,NULL,NULL,NULL)
> 11/28 10:09:43 MJobSetCreds(BFWindow,[ALL],[ALL],[ALL])
> 11/28 10:09:43 INFO:     default QOS for job BFWindow set to DEFAULT(0) (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
> 11/28 10:09:43 INFO:     backfill window:  time:   INFINITY  nodes:   0  tasks:   0  mintime: 975320217 (idle nodes: 2)
> 11/28 10:09:43 MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE)
> 11/28 10:09:43 INFO:     total jobs selected in partition ALL: 1/1 
> 11/28 10:09:43 INFO:     job '7' Priority:        1
> 11/28 10:09:43 INFO:     Cred:      0(00.0)  FS:      0(00.0)  Attr:      0(00.0)  Serv:      0(00.0)  Targ:      0(00.0)  Res:      0(00.0)  Us:      0(00.0)
> 11/28 10:09:43 MSchedUpdateStats()
> 11/28 10:09:43 INFO:     iteration:    2   scheduling time:  0.005 seconds
> 11/28 10:09:43 MResUpdateStats()
> 11/28 10:09:43 INFO:     current util[2]:  0/2 (0.00%)  PH: 0.14%  active jobs: 0 of 2 (completed: 0)
> 11/28 10:09:43 MQueueCheckStatus()
> 11/28 10:09:43 MNodeCheckStatus()
> 11/28 10:09:43 MUClearChild(PID)
> 11/28 10:09:43 INFO:     scheduling complete.  sleeping 15 seconds
> ##################################
> [root at node10 root]# cat /etc/slurm.conf 
> # Slurm.conf file generated by configurator.html
> # See the slurm.conf man page for more information
> #
> ControlMachine=node10
> ControlAddr=202.117.10.21
> #BackupController=
> #BackupAddr=
> # 
> SlurmUser=slurm
> SlurmctldPort=7010
> SlurmdPort=7011
> AuthType=auth/none
> JobCredentialPrivateKey=/etc/slurm.key
> JobCredentialPublicCertificate=/etc/slurm.cert
> StateSaveLocation=/tmp
> SlurmdSpoolDir=/tmp/slurmd
> SwitchType=switch/none
> MpiDefault=none
> SlurmctldPidFile=/var/run/slurmctld.pid
> SlurmdPidFile=/var/run/slurmd.pid
> ProctrackType=proctrack/linuxproc
> #PluginDir= 
> CacheGroups=0
> #FirstJobId= 
> ReturnToService=1
> #MaxJobCount= 
> #PlugStackConfig= 
> #PropagatePrioProcess= 
> #PropagateResourceLimits= 
> #PropagateResourceLimitsExcept= 
> #Prolog= 
> #Epilog= 
> #SrunProlog= 
> #SrunEpilog= 
> #TaskProlog= 
> #TaskEpilog= 
> #TaskPlugin= 
> #TmpFs= 
> #UsePAM= 
> # 
> # TIMERS 
> SlurmctldTimeout=300
> SlurmdTimeout=300
> InactiveLimit=0
> MinJobAge=300
> KillWait=30
> Waittime=0
> # 
> # SCHEDULING
> #SchedulerType=sched/backfill
> SchedulerType=sched/wiki
> SchedulerAuth=42
> SchedulerPort=7321
> #SchedulerRootFilter= 
> SelectType=select/cons_res
> FastSchedule=0
> # 
> # LOGGING 
> SlurmctldDebug=5
> SlurmctldLogFile=/var/log/slurmctld.log
> SlurmdDebug=5
> SlurmdLogFile=/var/log/slurmd.log.%h
> JobCompType=jobcomp/filetxt
> JobCompLoc=/var/log/slurm.job.log
> JobAcctType=jobacct/linux
> JobAcctLogfile=/var/log/slurm_jobacct.log
> JobAcctFrequency=30
> # 
> # COMPUTE NODES 
> NodeName=node10
> #NodeAddr=202.117.10.21 
> Procs=2 State=UNKNOWN 
> NodeName=node7
> #NodeAddr=202.17.10.18
> Procs=4 State=UNKNOWN
> PartitionName=test Nodes=node[10,7] Default=YES MaxTime=60 State=UP
> 
> [root at node10 root]# cat /usr/local/maui/maui.cfg 
> # maui.cfg 3.2.6p18
> 
> SERVERHOST            node10
> # primary admin must be first in list
> ADMIN1                root
> 
> # Resource Manager Definition
> 
> RMCFG[node10] TYPE=WIKI
> RMPORT            7321            # or whatever you choose as a port
> RMHOST            node10
> RMAUTHTYPE[node10]  NONE
> 
> PARTITIONMODE ON
> NODECFG[node10]   PARTITION=test
> NODECFG[node7]    PARTITION=test
> 
> # Allocation Manager Definition
> 
> AMCFG[bank]  TYPE=NONE
> 
> # full parameter docs at http://supercluster.org/mauidocs/a.fparameters.html
> # use the 'schedctl -l' command to display current configuration
> 
> RMPOLLINTERVAL        00:00:15
> 
> SERVERPORT            42559
> SERVERMODE            NORMAL
> 
> # Admin: http://supercluster.org/mauidocs/a.esecurity.html
> 
> 
> LOGFILE               maui.log
> LOGFILEMAXSIZE        10000000
> LOGLEVEL              3
> 
> # Job Priority: http://supercluster.org/mauidocs/5.1jobprioritization.html
> 
> QUEUETIMEWEIGHT       1 
> 
> # FairShare: http://supercluster.org/mauidocs/6.3fairshare.html
> 
> #FSPOLICY              PSDEDICATED
> #FSDEPTH               7
> #FSINTERVAL            86400
> #FSDECAY               0.80
> 
> # Throttling Policies: http://supercluster.org/mauidocs/6.2throttlingpolicies.html
> 
> # NONE SPECIFIED
> 
> # Backfill: http://supercluster.org/mauidocs/8.2backfill.html
> 
> BACKFILLPOLICY        FIRSTFIT
> RESERVATIONPOLICY     CURRENTHIGHEST
> 
> # Node Allocation: http://supercluster.org/mauidocs/5.2nodeallocation.html
> 
> NODEALLOCATIONPOLICY  MINRESOURCE
> 
> # QOS: http://supercluster.org/mauidocs/7.3qos.html
> 
> # QOSCFG[hi]  PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB
> # QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE
> 
> # Standing Reservations: http://supercluster.org/mauidocs/7.1.3standingreservations.html
> 
> # SRSTARTTIME[test] 8:00:00
> # SRENDTIME[test]   17:00:00
> # SRDAYS[test]      MON TUE WED THU FRI
> # SRTASKCOUNT[test] 20
> # SRMAXTIME[test]   0:30:00
> 
> # Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html
> 
> # USERCFG[DEFAULT]      FSTARGET=25.0
> # USERCFG[john]         PRIORITY=100  FSTARGET=10.0-
> # GROUPCFG[staff]       PRIORITY=1000 QLIST=hi:low QDEF=hi
> # CLASSCFG[batch]       FLAGS=PREEMPTEE
> # CLASSCFG[interactive] FLAGS=PREEMPTOR
> 
> 
> 
> 
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers


More information about the mauiusers mailing list