[Mauiusers] Maui gets stopped when submit jobs to PBS

rishi pathak mailmaverick666 at gmail.com
Thu Dec 14 23:53:19 MST 2006


Hi, Berit
  I had the same issue.As you can see when maui updates previous jobs status
using MStatUpdateActiveJobUsage() function segmentation fault
occurs.Toresolve this try restartting the pbs server with option
'pbs_server -type
cold' to remove previous jobs.Then start maui.Remember to start maui after
you have started the pbs server.If the previous jobs does'nt get deleted
with above option, try using pbs_sched and remove all the jobs, then restart
the server and then start maui.

Hope this will
Regards--
 Rishi Pathak

On 12/13/06, Berit Hinnemann <behi at topsoe.dk> wrote:
>
> Hi all,
>
> I am new to installing Torque PBS and Maui. My system is a one
> dual-processor dual-core server for testing purposes, where I try things out
> before getting the actual cluster. I have installed both Torque PBS and this
> seems to work fine. Then I installed Maui and used the file maui.cfg as
> below, aside from telling that the queue system is PBS I did not change
> anything.
>
> Now the behavior is that I can start the 'maui' demon, issue 'showq' and
> see the queue, but when I submit a job, the maui demon seems to stop by
> itself. Then, when I issue "showq" I get
>
> [behi at RHE4Server 1proc]$ showq
> ERROR:    cannot send request to server localhost.localdomain:42559
> (server may not be running)
> ERROR:    cannot request service (status)
>
> I have appended the lines generated in maui.log below.
> The job runs fine and I can also submit several jobs, which are just done
> in the order submitted. I can also restart maui and repeat this procedure.
>
> Does anybody have an idea where I should be looking to figure out what is
> wrong? I would be grateful on any hints on how to get started.
> Best, Berit
>
> --------------------------------------
> Berit Hinnemann
> Research Scientist
> Haldor Topsøe A/S
> ---------------------------------------
>
> -------------------------------------------------------------------------------------------------------------------------------------
> output from maui.log upon submitting a job
> 12/13 16:23:35 INFO:     scheduling complete.  sleeping 30 seconds
> 12/13 16:24:06 ServerProcessRequests()
> 12/13 16:24:06 INFO:     not rolling logs (585245 < 10000000)
> 12/13 16:24:06 MResAdjust(NULL,0,0)
> 12/13 16:24:06 MStatInitializeActiveSysUsage()
> 12/13 16:24:06 MStatClearUsage([NONE],Active)
> 12/13 16:24:06 ServerUpdate()
> 12/13 16:24:06 MSysUpdateTime()
> 12/13 16:24:06 INFO:     starting iteration 7
> 12/13 16:24:06 MRMGetInfo()
> 12/13 16:24:06 MClusterClearUsage()
> 12/13 16:24:06 MRMClusterQuery()
> 12/13 16:24:06 MPBSClusterQuery(localhost.localdomain,RCount,SC)
> 12/13 16:24:06 __MPBSGetNodeState(Name,State,PNode)
> 12/13 16:24:06 INFO:     PBS node localhost.localdomain set to state Busy
> (job-exclusive)
> 12/13 16:24:06 INFO:     node 'localhost.localdomain' changed states from
> Idle to Busy
> 12/13 16:24:06 ALERT:    unexpected node transition on node '
> localhost.localdomain'  Idle -> Busy
> 12/13 16:24:06 MPBSNodeUpdate(localhost.localdomain,localhost.localdomain
> ,Busy,localhost.localdomain)
> 12/13 16:24:06 INFO:     node localhost.localdomain has joblist
> '0/10.localhost.localdomain, 1/10.localhost.localdomain,
> 2/10.localhost.localdomain, 3/10.localhost.localdomain'
> 12/13 16:24:06 ALERT:    cannot locate PBS job '10.localhost.localdomain'
> (running on node localhost.localdomain)
> 12/13 16:24:06 ALERT:    cannot locate PBS job '10.localhost.localdomain'
> (running on node localhost.localdomain)
> 12/13 16:24:06 ALERT:    cannot locate PBS job '10.localhost.localdomain'
> (running on node localhost.localdomain)
> 12/13 16:24:06 ALERT:    cannot locate PBS job '10.localhost.localdomain'
> (running on node localhost.localdomain)
> 12/13 16:24:06 MPBSLoadQueueInfo(localhost.localdomain,
> localhost.localdomain,SC)
> 12/13 16:24:06 INFO:     queue 'batch' started state set to True
> 12/13 16:24:06 INFO:     class to node not mapping enabled for queue
> 'batch' adding class to all nodes
> 12/13 16:24:06 INFO:     1 PBS resources detected on RM
> localhost.localdomain
> 12/13 16:24:06 INFO:     resources detected: 1
> 12/13 16:24:06 MRMWorkloadQuery()
> 12/13 16:24:06 MPBSWorkloadQuery(localhost.localdomain,JCount,SC)
> 12/13 16:24:06 MPBSJobLoad(10,10.localhost.localdomain,J,TaskList,0)
> 12/13 16:24:06 MReqCreate(10,SrcRQ,DstRQ,DoCreate)
> 12/13 16:24:06 INFO:     processing node request line '1:ppn=4'
> 12/13 16:24:06 MJobSetCreds(10,behi,behi,)
> 12/13 16:24:06 INFO:     default QOS for job 10 set to DEFAULT(0)
> (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
> 12/13 16:24:06 INFO:     default QOS for job 10 set to DEFAULT(0)
> (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
> 12/13 16:24:06 INFO:     default QOS for job 10 set to DEFAULT(0)
> (P:DEFAULT,U:[NONE],G:[NONE],A:[NONE],C:[NONE])
> 12/13 16:24:06 MResJCreate(10,MNodeList,-00:00:10,ActiveJob,Res)
> 12/13 16:24:06 MStatUpdateActiveJobUsage(10)
>
> ---------------------------------------------------------------------------------------------------------------------------------------
> maui.cfg
> # maui.cfg 3.2.6p18
>
> SERVERHOST            localhost.localdomain
> # primary admin must be first in list
> ADMIN1                root
>
> # Resource Manager Definition
>
> RMCFG[localhost.localdomain] TYPE=PBS
>
> # Allocation Manager Definition
>
> AMCFG[bank]  TYPE=NONE
>
> # full parameter docs at
> http://supercluster.org/mauidocs/a.fparameters.html
> # use the 'schedctl -l' command to display current configuration
>
> RMPOLLINTERVAL        00:00:30
>
> SERVERPORT            42559
> SERVERMODE            NORMAL
>
> # Admin: http://supercluster.org/mauidocs/a.esecurity.html
>
>
> LOGFILE               maui.log
> LOGFILEMAXSIZE        10000000
> LOGLEVEL              3
>
> # Job Priority: http://supercluster.org/mauidocs/5.1jobprioritization.html
>
> QUEUETIMEWEIGHT       1
>
> # FairShare: http://supercluster.org/mauidocs/6.3fairshare.html
>
> #FSPOLICY              PSDEDICATED
> #FSDEPTH               7
> #FSINTERVAL            86400
> #FSDECAY               0.80
>
> # Throttling Policies:
> http://supercluster.org/mauidocs/6.2throttlingpolicies.html
>
> # NONE SPECIFIED
>
> # Backfill: http://supercluster.org/mauidocs/8.2backfill.html
>
> BACKFILLPOLICY        FIRSTFIT
> RESERVATIONPOLICY     CURRENTHIGHEST
>
> # Node Allocation: http://supercluster.org/mauidocs/5.2nodeallocation.html
>
> NODEALLOCATIONPOLICY  MINRESOURCE
>
> # QOS: http://supercluster.org/mauidocs/7.3qos.html
>
> # QOSCFG[hi]  PRIORITY=100 XFTARGET=100 FLAGS=PREEMPTOR:IGNMAXJOB
> # QOSCFG[low] PRIORITY=-1000 FLAGS=PREEMPTEE
>
> # Standing Reservations:
> http://supercluster.org/mauidocs/7.1.3standingreservations.html
>
> # SRSTARTTIME[test] 8:00:00
> # SRENDTIME[test]   17:00:00
> # SRDAYS[test]      MON TUE WED THU FRI
> # SRTASKCOUNT[test] 20
> # SRMAXTIME[test]   0:30:00
>
> # Creds: http://supercluster.org/mauidocs/6.1fairnessoverview.html
>
> # USERCFG[DEFAULT]      FSTARGET=25.0
> # USERCFG[john]         PRIORITY=100  FSTARGET=10.0-
> # GROUPCFG[staff]       PRIORITY=1000 QLIST=hi:low QDEF=hi
> # CLASSCFG[batch]       FLAGS=PREEMPTEE
> # CLASSCFG[interactive] FLAGS=PREEMPTOR
>
>
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20061215/94fe9925/attachment.html


More information about the mauiusers mailing list