[Mauiusers] Re: Maui error messages
Rafael Folco
rfolco at linux.vnet.ibm.com
Wed Nov 19 12:55:00 MST 2008
I still didn't get any answer on this... let me attach the log and
configs here.
The question is:
I have everything working fine, so why Maui reports many errors like
"cannot query events on RM" ?
canceljob, showstart, showq, all Maui commands work fine, I can manage
my SLURM jobs good.
Thanks in advance.
Rafael
On Mon, 2008-11-17 at 15:54 -0200, Rafael Folco wrote:
> Hi all,
>
> I have Maui/SLURM working fine and I am able to run SLURM jobs with srun
> command. However, I see lots of errors at the maui.log file.
>
> 11/17 10:42:41 MRMCheckEvents()
> 11/17 10:42:41 ALERT: cannot query events on RM (RM 'cluster-ib-1'
> does not support function 'rmeventquery')
> 11/17 10:42:41 MSUAcceptClient(5,ClientSD,HostName,TCP)
> 11/17 10:42:41 INFO: accept call failed, errno: 11 (Resource
> temporarily unavailable)
> 11/17 10:42:41 INFO: all clients connected. servicing requests
>
> Any clue?
>
> Thanks,
>
> Rafael
--
Rafael Folco
Brazil Test Lead
IBM Linux Technology Center
E-Mail: rfolco at linux.vnet.ibm.com
-------------- next part --------------
# maui.cfg 3.2.6p20
SERVERHOST cluster1
# primary admin must be first in list
ADMIN1 root slurm
# Resource Manager Definition
RMCFG[cluster1] TYPE=WIKI
RMPORT 7321
RMHOST cluster1
RMAUTHTYPE[cluster1] NONE
# Allocation Manager Definition
AMCFG[bank] TYPE=NONE
# full parameter docs at http://supercluster.org/mauidocs/a.fparameters.html
# use the 'schedctl -l' command to display current configuration
RMPOLLINTERVAL 00:00:05
SERVERPORT 42559
SERVERMODE NORMAL
# Admin: http://supercluster.org/mauidocs/a.esecurity.html
LOGFILE maui.log
LOGFILEMAXSIZE 10000000
LOGLEVEL 9
# Job Priority: http://supercluster.org/mauidocs/5.1jobprioritization.html
QUEUETIMEWEIGHT 1
# FairShare: http://supercluster.org/mauidocs/6.3fairshare.html
#FSPOLICY PSDEDICATED
#FSDEPTH 7
#FSINTERVAL 86400
#FSDECAY 0.80
# Throttling Policies: http://supercluster.org/mauidocs/6.2throttlingpolicies.html
# NONE SPECIFIED
# Backfill: http://supercluster.org/mauidocs/8.2backfill.html
BACKFILLPOLICY FIRSTFIT
RESERVATIONPOLICY CURRENTHIGHEST
# Node Allocation: http://supercluster.org/mauidocs/5.2nodeallocation.html
NODEALLOCATIONPOLICY MINRESOURCE
# QOS: http://supercluster.org/mauidocs/7.3qos.html
# QOSCFG[hi] PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB
# QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE
# Standing Reservations: http://supercluster.org/mauidocs/7.1.3standingreservations.html
# SRSTARTTIME[test] 8:00:00
# SRENDTIME[test] 17:00:00
# SRDAYS[test] MON TUE WED THU FRI
# SRTASKCOUNT[test] 20
# SRMAXTIME[test] 0:30:00
# Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html
# USERCFG[DEFAULT] FSTARGET=25.0
# USERCFG[john] PRIORITY=100 FSTARGET=10.0-
# GROUPCFG[staff] PRIORITY=1000 QLIST=hi:low QDEF=hi
# CLASSCFG[batch] FLAGS=PREEMPTEE
# CLASSCFG[interactive] FLAGS=PREEMPTOR
PARTITIONMODE ON
NODECFG[cluster1] PARTITION=openhpc
NODECFG[cluster2] PARTITION=openhpc
NODECFG[cluster3] PARTITION=openhpc
NODECFG[cluster4] PARTITION=openhpc
NODECFG[cluster5] PARTITION=openhpc
-------------- next part --------------
# slurm.conf file generated by configurator.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ControlMachine=cluster1
ControlAddr=10.1.1.10
#BackupController=
#BackupAddr=
#
AuthType=auth/munge
CacheGroups=0
#CheckpointType=checkpoint/none
CryptoType=crypto/munge
#DisableRootJobs=NO
#EnforcePartLimits=NO
#Epilog=
#FirstJobId=1
JobCredentialPrivateKey=/etc/slurm/slurm.key
JobCredentialPublicCertificate=/etc/slurm/slurm.cert
#JobFileAppend=0
#JobRequeue=1
#Licenses=foo*4,bar
#MailProg=/bin/mail
#MaxJobCount=5000
MpiDefault=none
#PluginDir=
#PlugStackConfig=
#PrivateData=jobs
ProctrackType=proctrack/pgid
#Prolog=
#PropagatePrioProcess=0
#PropagateResourceLimits=
#PropagateResourceLimitsExcept=
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/tmp/slurmd
SlurmUser=slurm
#SrunEpilog=
#SrunProlog=
StateSaveLocation=/tmp
SwitchType=switch/none
#TaskEpilog=
TaskPlugin=task/none
#TaskPluginParam=
#TaskProlog=
#TmpFs=/tmp
#TreeWidth=
#UnkillableStepProgram=
#UnkillableStepTimeout=
#UsePAM=0
#
#
# TIMERS
#EpilogMsgTime=2000
#GetEnvTimeout=2
#HealthCheckInterval=0
#HealthCheckProgram=
InactiveLimit=0
MinJobAge=300
KillWait=30
#MessageTimeout=10
SlurmctldTimeout=300
SlurmdTimeout=300
#UnkillableStepProgram=
#UnkillableStepTimeout=60
Waittime=0
#
#
# SCHEDULING
#DefMemPerCPU=0
FastSchedule=1
#MaxMemPerCPU=0
#SchedulerRootFilter=1
#SchedulerTimeSlice=30
SchedulerType=sched/wiki
SchedulerPort=7321
SelectType=select/linear
#SelectTypeParameters=
#
#
# LOGGING AND ACCOUNTING
#AccountingStorageEnforce=0
#AccountingStorageHost=
#AccountingStorageLoc=
#AccountingStoragePass=
#AccountingStoragePort=
AccountingStorageType=accounting_storage/none
#AccountingStorageUser=
ClusterName=cluster
#JobCompHost=
#JobCompLoc=
#JobCompPass=
#JobCompPort=
JobCompType=jobcomp/none
#JobCompUser=
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=9
SlurmctldLogFile=/tmp/slurm/log/slurmctld.log
SlurmdDebug=9
SlurmdLogFile=/tmp/slurm/log/slurmd.log
#
#
# POWER SAVE SUPPORT FOR IDLE NODES (optional)
#SuspendProgram=
#ResumeProgram=
#ResumeRate=
#SuspendExcNodes=
#SuspendExcParts=
#SuspendRate=
#SuspendTime=
#
#
# COMPUTE NODES
NodeName=cluster[1-5] Procs=1 State=UNKNOWN
PartitionName=openhpc Nodes=cluster[1-5] Default=YES MaxTime=INFINITE State=UP
-------------- next part --------------
A non-text attachment was scrubbed...
Name: maui-errors.log
Type: text/x-log
Size: 22859 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/mauiusers/attachments/20081119/81aab193/maui-errors-0001.bin
More information about the mauiusers
mailing list