[Mauiusers] maui segfaults trying to schedule a job

DuChene, StevenX A stevenx.a.duchene at intel.com
Mon Nov 28 17:04:44 MST 2011


This morning I discovered that the maui scheduler process was not running on one of our clusters like it should. When I try to start the maui process as the maui user I get a segmentation fault. In checking the log files the last few entries look like this:

11/28 15:45:24 MQueueSelectAllJobs(Q,HARD,ALL,JIList,DP,Msg)
11/28 15:45:24 INFO:     job '231' Priority:      605
11/28 15:45:24 INFO:     Cred:      0(00.0)  FS:      0(00.0)  Attr:      0(00.0)  Serv:    605(00.0)  Targ:      0(00.0)  Res:      0(00.0)  Us:      0(00.0)
11/28 15:45:24 MStatClearUsage([NONE],Active)
11/28 15:45:24 INFO:     total jobs selected (ALL): 1/1
11/28 15:45:24 MQueueSelectAllJobs(Q,SOFT,ALL,JIList,DP,Msg)
11/28 15:45:24 INFO:     job '231' Priority:      605
11/28 15:45:24 INFO:     Cred:      0(00.0)  FS:      0(00.0)  Attr:      0(00.0)  Serv:    605(00.0)  Targ:      0(00.0)  Res:      0(00.0)  Us:      0(00.0)
11/28 15:45:24 MStatClearUsage([NONE],Idle)
11/28 15:45:24 INFO:     total jobs selected (ALL): 1/1
11/28 15:45:24 MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,EVERY,FReason,FALSE)
11/28 15:45:24 INFO:     total jobs selected in partition ALL: 1/1
11/28 15:45:24 MQueueScheduleRJobs(Q)
11/28 15:45:24 MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE)
11/28 15:45:24 INFO:     total jobs selected in partition ALL: 1/1
11/28 15:45:24 MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,DEFAULT,FReason,TRUE)
11/28 15:45:24 INFO:     total jobs selected in partition DEFAULT: 1/1
11/28 15:45:24 MQueueScheduleIJobs(Q,DEFAULT)
11/28 15:45:24 INFO:     156 feasible tasks found for job 231:0 in partition DEFAULT (39 Needed)
11/28 15:45:24 INFO:     156 feasible tasks found for job 231:1 in partition DEFAULT (39 Needed)
11/28 15:45:24 INFO:     156 feasible tasks found for job 231:2 in partition DEFAULT (39 Needed)
11/28 15:45:24 INFO:     156 feasible tasks found for job 231:3 in partition DEFAULT (39 Needed)
11/28 15:45:24 INFO:     156 feasible tasks found for job 231:4 in partition DEFAULT (16 Needed)

Prior to the above entries there are a WHOLE BUNCH of entries similar to these shown below:

11/28 15:45:24 MUGetIndex(TJC,ValList,0)
11/28 15:45:24 MUGetIndex(TNJA,ValList,0)
11/28 15:45:24 MUGetIndex(TNJC,ValList,0)
11/28 15:45:24 MUGetIndex(TNXF,ValList,0)
11/28 15:45:24 MUGetIndex(TPSD,ValList,0)
11/28 15:45:24 MUGetIndex(TPSE,ValList,0)
11/28 15:45:24 MUGetIndex(TPSR,ValList,0)
11/28 15:45:24 MUGetIndex(TPSU,ValList,0)
11/28 15:45:24 MUGetIndex(TQM,ValList,0)
11/28 15:45:24 MUGetIndex(TQT,ValList,0)
11/28 15:45:24 MUGetIndex(TRT,ValList,0)
11/28 15:45:24 MUGetIndex(TXF,ValList,0)

There is only this one job in the queue on a 256 node cluster running torque 2.5.7 and maui 3.2.6p21

I have tried starting the maui process within strace but I do not see any smoking gun in that strace output.

I can probably get maui to start if I qdel the job but I was sort of hoping to see what was causing the problem in case any additional debugging output was needed.
--
Steven DuChene
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20111128/cf48f6bc/attachment.html 


More information about the mauiusers mailing list