[Mauiusers] Maui gets stopped when submit jobs to PBS

Josh Butikofer josh at clusterresources.com
Thu Dec 14 07:19:56 MST 2006


Berit,

Try running Maui under a debugger like gdb to see why Maui is shutting down. From your description,
I would guess Maui is crashing or experiencing a seg fault. Instructions on how run Maui under gdb
can can be found at
http://www.clusterresources.com/products/maui/docs/14.6troubleshootingsystemerrors.shtml, Section
14.6.2.1.

Please send the stack trace to the list so we can get this fixed.

Thanks,

-- 
Joshua Butikofer
Cluster Resources, Inc.

josh at clusterresources.com
Voice: (801) 717-3707
Fax:   (801) 717-3738
--------------------------


Berit Hinnemann wrote:
> Hi all,
> 
> I am new to installing Torque PBS and Maui. My system is a one dual-processor 
> dual-core server for testing purposes, where I try things out before getting the 
> actual cluster. I have installed both Torque PBS and this seems to work fine. 
> Then I installed Maui and used the file maui.cfg as below, aside from telling 
> that the queue system is PBS I did not change anything.
> 
> Now the behavior is that I can start the 'maui' demon, issue 'showq' and see the 
> queue, but when I submit a job, the maui demon seems to stop by itself. Then, 
> when I issue "showq" I get
> 
> [behi at RHE4Server 1proc]$ showq
> ERROR:    cannot send request to server localhost.localdomain:42559 (server may 
> not be running)
> ERROR:    cannot request service (status)
> 
> I have appended the lines generated in maui.log below.
> The job runs fine and I can also submit several jobs, which are just done in the 
> order submitted. I can also restart maui and repeat this procedure.
> 
> Does anybody have an idea where I should be looking to figure out what is wrong? 
> I would be grateful on any hints on how to get started.
> Best, Berit
> 
> --------------------------------------
> Berit Hinnemann
> Research Scientist
> Haldor Topsøe A/S
> ---------------------------------------
> -------------------------------------------------------------------------------------------------------------------------------------
> output from maui.log upon submitting a job
> 12/13 16:23:35 INFO:     scheduling complete.  sleeping 30 seconds
> 12/13 16:24:06 ServerProcessRequests()
> 12/13 16:24:06 INFO:     not rolling logs (585245 < 10000000)
> 12/13 16:24:06 MResAdjust(NULL,0,0)
> 12/13 16:24:06 MStatInitializeActiveSysUsage()
> 12/13 16:24:06 MStatClearUsage([NONE],Active)
> 12/13 16:24:06 ServerUpdate()
> 12/13 16:24:06 MSysUpdateTime()
> 12/13 16:24:06 INFO:     starting iteration 7
> 12/13 16:24:06 MRMGetInfo()
> 12/13 16:24:06 MClusterClearUsage()
> 12/13 16:24:06 MRMClusterQuery()
> 12/13 16:24:06 MPBSClusterQuery(localhost.localdomain,RCount,SC)
> 12/13 16:24:06 __MPBSGetNodeState(Name,State,PNode)
> 12/13 16:24:06 INFO:     PBS node localhost.localdomain set to state Busy 
> (job-exclusive)
> 12/13 16:24:06 INFO:     node 'localhost.localdomain' changed states from Idle 
> to Busy
> 12/13 16:24:06 ALERT:    unexpected node transition on node 
> 'localhost.localdomain'  Idle -> Busy
> 12/13 16:24:06 
> MPBSNodeUpdate(localhost.localdomain,localhost.localdomain,Busy,localhost.localdomain)
> 12/13 16:24:06 INFO:     node localhost.localdomain has joblist 
> '0/10.localhost.localdomain, 1/10.localhost.localdomain, 
> 2/10.localhost.localdomain, 3/10.localhost.localdomain'
> 12/13 16:24:06 ALERT:    cannot locate PBS job '10.localhost.localdomain' 
> (running on node localhost.localdomain)
> 12/13 16:24:06 ALERT:    cannot locate PBS job '10.localhost.localdomain' 
> (running on node localhost.localdomain)
> 12/13 16:24:06 ALERT:    cannot locate PBS job '10.localhost.localdomain' 
> (running on node localhost.localdomain)
> 12/13 16:24:06 ALERT:    cannot locate PBS job '10.localhost.localdomain' 
> (running on node localhost.localdomain)
> 12/13 16:24:06 MPBSLoadQueueInfo(localhost.localdomain,localhost.localdomain,SC)
> 12/13 16:24:06 INFO:     queue 'batch' started state set to True
> 12/13 16:24:06 INFO:     class to node not mapping enabled for queue 'batch' 
> adding class to all nodes
> 12/13 16:24:06 INFO:     1 PBS resources detected on RM localhost.localdomain
> 12/13 16:24:06 INFO:     resources detected: 1
> 12/13 16:24:06 MRMWorkloadQuery()
> 12/13 16:24:06 MPBSWorkloadQuery(localhost.localdomain,JCount,SC)
> 12/13 16:24:06 MPBSJobLoad(10,10.localhost.localdomain,J,TaskList,0)
> 12/13 16:24:06 MReqCreate(10,SrcRQ,DstRQ,DoCreate)
> 12/13 16:24:06 INFO:     processing node request line '1:ppn=4'
> 12/13 16:24:06 MJobSetCreds(10,behi,behi,)
> 12/13 16:24:06 INFO:     default QOS for job 10 set to DEFAULT(0) 
> (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
> 12/13 16:24:06 INFO:     default QOS for job 10 set to DEFAULT(0) 
> (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
> 12/13 16:24:06 INFO:     default QOS for job 10 set to DEFAULT(0) 
> (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
> 12/13 16:24:06 MResJCreate(10,MNodeList,-00:00:10,ActiveJob,Res)
> 12/13 16:24:06 MStatUpdateActiveJobUsage(10)
> ---------------------------------------------------------------------------------------------------------------------------------------
> maui.cfg
> # maui.cfg 3.2.6p18
> 
> SERVERHOST            localhost.localdomain
> # primary admin must be first in list
> ADMIN1                root
> 
> # Resource Manager Definition
> 
> RMCFG[localhost.localdomain] TYPE=PBS
> 
> # Allocation Manager Definition
> 
> AMCFG[bank]  TYPE=NONE
> 
> # full parameter docs at http://supercluster.org/mauidocs/a.fparameters.html
> # use the 'schedctl -l' command to display current configuration
> 
> RMPOLLINTERVAL        00:00:30
> 
> SERVERPORT            42559
> SERVERMODE            NORMAL
> 
> # Admin: http://supercluster.org/mauidocs/a.esecurity.html
> 
> 
> LOGFILE               maui.log
> LOGFILEMAXSIZE        10000000
> LOGLEVEL              3
> 
> # Job Priority: http://supercluster.org/mauidocs/5.1jobprioritization.html
> 
> QUEUETIMEWEIGHT       1
> 
> # FairShare: http://supercluster.org/mauidocs/6.3fairshare.html
> 
> #FSPOLICY              PSDEDICATED
> #FSDEPTH               7
> #FSINTERVAL            86400
> #FSDECAY               0.80
> 
> # Throttling Policies: http://supercluster.org/mauidocs/6.2throttlingpolicies.html
> 
> # NONE SPECIFIED
> 
> # Backfill: http://supercluster.org/mauidocs/8.2backfill.html
> 
> BACKFILLPOLICY        FIRSTFIT
> RESERVATIONPOLICY     CURRENTHIGHEST
> 
> # Node Allocation: http://supercluster.org/mauidocs/5.2nodeallocation.html
> 
> NODEALLOCATIONPOLICY  MINRESOURCE
> 
> # QOS: http://supercluster.org/mauidocs/7.3qos.html
> 
> # QOSCFG[hi]  PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB
> # QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE
> 
> # Standing Reservations: 
> http://supercluster.org/mauidocs/7.1.3standingreservations.html
> 
> # SRSTARTTIME[test] 8:00:00
> # SRENDTIME[test]   17:00:00
> # SRDAYS[test]      MON TUE WED THU FRI
> # SRTASKCOUNT[test] 20
> # SRMAXTIME[test]   0:30:00
> 
> # Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html
> 
> # USERCFG[DEFAULT]      FSTARGET=25.0
> # USERCFG[john]         PRIORITY=100  FSTARGET=10.0-
> # GROUPCFG[staff]       PRIORITY=1000 QLIST=hi:low QDEF=hi
> # CLASSCFG[batch]       FLAGS=PREEMPTEE
> # CLASSCFG[interactive] FLAGS=PREEMPTOR
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers


More information about the mauiusers mailing list