[Mauiusers] question on maui 3.2.6p20: can not get job list from
WIKI RM
Hien Nguyen
hien1 at us.ibm.com
Tue Nov 4 07:19:52 MST 2008
I run maui and slurm 1.3.6 . I found that in maui log there are errors and
alerts:
11/03 23:56:40 ERROR: command 'CMD=GETNODES ARG=0:ALL' SC: -300
response: 'NONE'
11/03 23:56:40 ALERT: cannot get node list from WIKI RM
11/03 23:56:40 ALERT: cannot load cluster resources on RM (RM
'p6ihopenhpc-ib-3' failed in function 'clusterquery')
11/03 23:56:40 WARNING: no resources detected
Can someone tell what's wrong with the config of maui and slurm?
file maui.cfg:
-------------------------------------
# maui.cfg 3.2.6p20
SERVERHOST p6ihopenhpc-ib-3
# primary admin must be first in list
ADMIN1 root
# Resource Manager Definition
RMCFG[p6ihopenhpc-ib-3] TYPE=WIKI
RMPORT 7321
RMHOST p6ihopenhpc-ib-3
RMAUTHTYPE[p6ihopenhpc-ib-3] MUNGE
# Allocation Manager Definition
AMCFG[bank] TYPE=NONE
# full parameter docs at
http://supercluster.org/mauidocs/a.fparameters.html
# use the 'schedctl -l' command to display current configuration
RMPOLLINTERVAL 00:00:20
SERVERPORT 42559
SERVERMODE NORMAL
# Admin: http://supercluster.org/mauidocs/a.esecurity.html
LOGFILE maui.log
LOGFILEMAXSIZE 10000000
LOGLEVEL 3
# Job Priority: http://supercluster.org/mauidocs/5.1jobprioritization.html
QUEUETIMEWEIGHT 1
# FairShare: http://supercluster.org/mauidocs/6.3fairshare.html
#FSPOLICY PSDEDICATED
#FSDEPTH 7
#FSINTERVAL 86400
#FSDECAY 0.80
# Throttling Policies:
http://supercluster.org/mauidocs/6.2throttlingpolicies.html
# NONE SPECIFIED
# Backfill: http://supercluster.org/mauidocs/8.2backfill.html
BACKFILLPOLICY FIRSTFIT
RESERVATIONPOLICY CURRENTHIGHEST
# Node Allocation: http://supercluster.org/mauidocs/5.2nodeallocation.html
NODEALLOCATIONPOLICY MINRESOURCE
# QOS: http://supercluster.org/mauidocs/7.3qos.html
# QOSCFG[hi] PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB
# QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE
# Standing Reservations:
http://supercluster.org/mauidocs/7.1.3standingreservations.html
# SRSTARTTIME[test] 8:00:00
# SRENDTIME[test] 17:00:00
# SRDAYS[test] MON TUE WED THU FRI
# SRTASKCOUNT[test] 20
# SRMAXTIME[test] 0:30:00
# Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html
# USERCFG[DEFAULT] FSTARGET=25.0
# USERCFG[john] PRIORITY=100 FSTARGET=10.0-
# GROUPCFG[staff] PRIORITY=1000 QLIST=hi:low QDEF=hi
# CLASSCFG[batch] FLAGS=PREEMPTEE
# CLASSCFG[interactive] FLAGS=PREEMPTOR
PARTITIONMODE ON
NODECFG[p6ihopenhpc-ib-3] PARTITION=debug
NODECFG[p6ihopenhpc-ib-4] PARTITION=debug
NODECFG[p6ihopenhpc-ib-5] PARTITION=debug
NODECFG[p6ihopenhpc-ib-6] PARTITION=debug
====================================================
File slurm.conf:
-------------------------------------
# slurm.conf file generated by configurator.html.
# See the slurm.conf man page for more information.
#
ControlMachine=p6ihopenhpc-ib-3
ControlAddr=10.2.1.30
BackupController=p6ihopenhpc-ib-1
BackupAddr=10.2.1.10
#
AuthType=auth/munge
#AuthType=auth/none
CacheGroups=0
#CheckpointType=checkpoint/none
#CryptoType=crypto/openssl
CryptoType=crypto/munge
#Epilog=
#FirstJobId=1
JobCredentialPrivateKey=/etc/slurm/slurm.key
JobCredentialPublicCertificate=/etc/slurm/slurm.cert
#JobFileAppend=0
#JobRequeue=1
#MailProg=/bin/mail
#MaxJobCount=5000
MpiDefault=none
#PluginDir=
#PlugStackConfig=
#PrivateData=0
ProctrackType=proctrack/pgid
#Prolog=
#PropagatePrioProcess=0
#PropagateResourceLimits=
#PropagateResourceLimitsExcept=
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/tmp/slurmd
SlurmUser=slurm
#SrunEpilog=
#SrunProlog=
StateSaveLocation=/tmp
SwitchType=switch/none
#TaskEpilog=
TaskPlugin=task/none
#TaskPluginParam=
#TaskProlog=
#TmpFs=/tmp
#TreeWidth=
#UnkillableStepProgram=
#UnkillableStepTimeout=
#UsePAM=0
#
#
# TIMERS
#EpilogMsgTime=2000
#GetEnvTimeout=2
#HealthCheckInterval=0
#HealthCheckProgram=
InactiveLimit=0
MinJobAge=300
KillWait=30
#MessageTimeout=10
SlurmctldTimeout=300
SlurmdTimeout=300
#UnkillableStepProgram=
#UnkillableStepTimeout=60
Waittime=0
#
#
# SCHEDULING
#DefMemPerTask=0
FastSchedule=1
#MaxMemPerTask=0
#SchedulerRootFilter=1
#SchedulerTimeSlice=30
#SchedulerType=sched/backfill
SchedulerType=sched/wiki
SchedulerPort=7321
SelectType=select/linear
#SelectTypeParameters=
#
#
# LOGGING AND ACCOUNTING
#AccountingStorageEnforce=0
#AccountingStorageHost=
#AccountingStorageLoc=
#AccountingStoragePass=
#AccountingStoragePort=
#AccountingStorageType=jobacct_storage/none
#AccountingStorageUser=
ClusterName=cluster
#JobCompHost=
#JobCompLoc=
#JobCompPass=
#JobCompPort=
JobCompType=jobcomp/none
#JobCompUser=
#JobAcctGatherFrequency=
#JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=3
SlurmctldLogFile=/tmp/slurm/slurmctld.log
SlurmdDebug=3
SlurmdLogFile=/tmp/slurm/slurmd.log
#
#
# POWER SAVE SUPPORT FOR IDLE NODES (optional)
#SuspendProgram=
#ResumeProgram=
#ResumeRate=
#SuspendExcNodes=
#SuspendExcParts=
#SuspendRate=
#SuspendTime=
#
#
# COMPUTE NODES
NodeName=p6ihopenhpc-ib-[3-6] Procs=1 State=UNKNOWN
PartitionName=debug Nodes=p6ihopenhpc-ib-[3-6] Default=YES
MaxTime=INFINITE State=UP
-------------------------------------
Regards,
Hien Nguyen
Linux Technology Center (Austin)
Phone: (512) 838-4140 Tie Line: 678-4140
e-mail: hien1 at us.ibm.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20081104/6c412f21/attachment-0001.html
More information about the mauiusers
mailing list