[Mauiusers] question on maui 3.2.6p20: can not get job list from WIKI RM

Hien Nguyen hien1 at us.ibm.com
Tue Nov 4 07:19:52 MST 2008


I run maui and slurm 1.3.6 . I found that in maui log there are errors and 
alerts:
11/03 23:56:40 ERROR:    command 'CMD=GETNODES ARG=0:ALL'  SC: -300 
response: 'NONE'
11/03 23:56:40 ALERT:    cannot get node list from WIKI RM
11/03 23:56:40 ALERT:    cannot load cluster resources on RM (RM 
'p6ihopenhpc-ib-3' failed in function 'clusterquery')
11/03 23:56:40 WARNING:  no resources detected

Can someone tell what's wrong with the config of maui and slurm?

file maui.cfg:
-------------------------------------
# maui.cfg 3.2.6p20

SERVERHOST            p6ihopenhpc-ib-3
# primary admin must be first in list
ADMIN1                root

# Resource Manager Definition

RMCFG[p6ihopenhpc-ib-3] TYPE=WIKI
RMPORT          7321
RMHOST          p6ihopenhpc-ib-3
RMAUTHTYPE[p6ihopenhpc-ib-3] MUNGE


# Allocation Manager Definition

AMCFG[bank]  TYPE=NONE

# full parameter docs at 
http://supercluster.org/mauidocs/a.fparameters.html
# use the 'schedctl -l' command to display current configuration

RMPOLLINTERVAL        00:00:20

SERVERPORT            42559
SERVERMODE            NORMAL

# Admin: http://supercluster.org/mauidocs/a.esecurity.html


LOGFILE               maui.log
LOGFILEMAXSIZE        10000000
LOGLEVEL              3

# Job Priority: http://supercluster.org/mauidocs/5.1jobprioritization.html

QUEUETIMEWEIGHT       1 

# FairShare: http://supercluster.org/mauidocs/6.3fairshare.html

#FSPOLICY              PSDEDICATED
#FSDEPTH               7
#FSINTERVAL            86400
#FSDECAY               0.80

# Throttling Policies: 
http://supercluster.org/mauidocs/6.2throttlingpolicies.html

# NONE SPECIFIED

# Backfill: http://supercluster.org/mauidocs/8.2backfill.html

BACKFILLPOLICY        FIRSTFIT
RESERVATIONPOLICY     CURRENTHIGHEST

# Node Allocation: http://supercluster.org/mauidocs/5.2nodeallocation.html

NODEALLOCATIONPOLICY  MINRESOURCE

# QOS: http://supercluster.org/mauidocs/7.3qos.html

# QOSCFG[hi]  PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB
# QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE

# Standing Reservations: 
http://supercluster.org/mauidocs/7.1.3standingreservations.html

# SRSTARTTIME[test] 8:00:00
# SRENDTIME[test]   17:00:00
# SRDAYS[test]      MON TUE WED THU FRI
# SRTASKCOUNT[test] 20
# SRMAXTIME[test]   0:30:00

# Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html

# USERCFG[DEFAULT]      FSTARGET=25.0
# USERCFG[john]         PRIORITY=100  FSTARGET=10.0-
# GROUPCFG[staff]       PRIORITY=1000 QLIST=hi:low QDEF=hi
# CLASSCFG[batch]       FLAGS=PREEMPTEE
# CLASSCFG[interactive] FLAGS=PREEMPTOR


PARTITIONMODE ON

NODECFG[p6ihopenhpc-ib-3]       PARTITION=debug
NODECFG[p6ihopenhpc-ib-4]       PARTITION=debug
NODECFG[p6ihopenhpc-ib-5]       PARTITION=debug
NODECFG[p6ihopenhpc-ib-6]       PARTITION=debug

====================================================
File slurm.conf:
-------------------------------------
# slurm.conf file generated by configurator.html.
# See the slurm.conf man page for more information.
#
ControlMachine=p6ihopenhpc-ib-3
ControlAddr=10.2.1.30
BackupController=p6ihopenhpc-ib-1
BackupAddr=10.2.1.10
#
AuthType=auth/munge
#AuthType=auth/none
CacheGroups=0
#CheckpointType=checkpoint/none
#CryptoType=crypto/openssl
CryptoType=crypto/munge
#Epilog=
#FirstJobId=1
JobCredentialPrivateKey=/etc/slurm/slurm.key
JobCredentialPublicCertificate=/etc/slurm/slurm.cert
#JobFileAppend=0
#JobRequeue=1
#MailProg=/bin/mail
#MaxJobCount=5000
MpiDefault=none
#PluginDir=
#PlugStackConfig=
#PrivateData=0
ProctrackType=proctrack/pgid
#Prolog=
#PropagatePrioProcess=0
#PropagateResourceLimits=
#PropagateResourceLimitsExcept=
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/tmp/slurmd
SlurmUser=slurm
#SrunEpilog=
#SrunProlog=
StateSaveLocation=/tmp
SwitchType=switch/none
#TaskEpilog=
TaskPlugin=task/none
#TaskPluginParam=
#TaskProlog=
#TmpFs=/tmp
#TreeWidth=
#UnkillableStepProgram=
#UnkillableStepTimeout=
#UsePAM=0
#
#
# TIMERS
#EpilogMsgTime=2000
#GetEnvTimeout=2
#HealthCheckInterval=0
#HealthCheckProgram=
InactiveLimit=0
MinJobAge=300
KillWait=30
#MessageTimeout=10
SlurmctldTimeout=300
SlurmdTimeout=300
#UnkillableStepProgram=
#UnkillableStepTimeout=60
Waittime=0
#
#
# SCHEDULING
#DefMemPerTask=0
FastSchedule=1
#MaxMemPerTask=0
#SchedulerRootFilter=1
#SchedulerTimeSlice=30
#SchedulerType=sched/backfill
SchedulerType=sched/wiki
SchedulerPort=7321
SelectType=select/linear
#SelectTypeParameters=
#
#
# LOGGING AND ACCOUNTING
#AccountingStorageEnforce=0
#AccountingStorageHost=
#AccountingStorageLoc=
#AccountingStoragePass=
#AccountingStoragePort=
#AccountingStorageType=jobacct_storage/none
#AccountingStorageUser=
ClusterName=cluster
#JobCompHost=
#JobCompLoc=
#JobCompPass=
#JobCompPort=
JobCompType=jobcomp/none
#JobCompUser=
#JobAcctGatherFrequency=
#JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=3
SlurmctldLogFile=/tmp/slurm/slurmctld.log
SlurmdDebug=3
SlurmdLogFile=/tmp/slurm/slurmd.log
#
#
# POWER SAVE SUPPORT FOR IDLE NODES (optional)
#SuspendProgram=
#ResumeProgram=
#ResumeRate=
#SuspendExcNodes=
#SuspendExcParts=
#SuspendRate=
#SuspendTime=
#
#
# COMPUTE NODES
NodeName=p6ihopenhpc-ib-[3-6] Procs=1 State=UNKNOWN
PartitionName=debug Nodes=p6ihopenhpc-ib-[3-6] Default=YES 
MaxTime=INFINITE State=UP
-------------------------------------

Regards,

 Hien Nguyen
Linux Technology Center (Austin)
 Phone: (512) 838-4140            Tie Line: 678-4140
 e-mail: hien1 at us.ibm.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20081104/6c412f21/attachment-0001.html


More information about the mauiusers mailing list