[Mauiusers] maui segfaults trying to schedule a job
DuChene, StevenX A
stevenx.a.duchene at intel.com
Mon Nov 28 17:28:58 MST 2011
BTW, I just tried upgrading to maui-3.3.1 and I still have the same issue. Maui segfaults when I try to start the maui process with this one job in the queue.
--
Steven DuChene
From: DuChene, StevenX A
Sent: Monday, November 28, 2011 4:05 PM
To: mauiusers at supercluster.org
Subject: maui segfaults trying to schedule a job
This morning I discovered that the maui scheduler process was not running on one of our clusters like it should. When I try to start the maui process as the maui user I get a segmentation fault. In checking the log files the last few entries look like this:
11/28 15:45:24 MQueueSelectAllJobs(Q,HARD,ALL,JIList,DP,Msg)
11/28 15:45:24 INFO: job '231' Priority: 605
11/28 15:45:24 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 605(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
11/28 15:45:24 MStatClearUsage([NONE],Active)
11/28 15:45:24 INFO: total jobs selected (ALL): 1/1
11/28 15:45:24 MQueueSelectAllJobs(Q,SOFT,ALL,JIList,DP,Msg)
11/28 15:45:24 INFO: job '231' Priority: 605
11/28 15:45:24 INFO: Cred: 0(00.0) FS: 0(00.0) Attr: 0(00.0) Serv: 605(00.0) Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
11/28 15:45:24 MStatClearUsage([NONE],Idle)
11/28 15:45:24 INFO: total jobs selected (ALL): 1/1
11/28 15:45:24 MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,EVERY,FReason,FALSE)
11/28 15:45:24 INFO: total jobs selected in partition ALL: 1/1
11/28 15:45:24 MQueueScheduleRJobs(Q)
11/28 15:45:24 MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE)
11/28 15:45:24 INFO: total jobs selected in partition ALL: 1/1
11/28 15:45:24 MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,DEFAULT,FReason,TRUE)
11/28 15:45:24 INFO: total jobs selected in partition DEFAULT: 1/1
11/28 15:45:24 MQueueScheduleIJobs(Q,DEFAULT)
11/28 15:45:24 INFO: 156 feasible tasks found for job 231:0 in partition DEFAULT (39 Needed)
11/28 15:45:24 INFO: 156 feasible tasks found for job 231:1 in partition DEFAULT (39 Needed)
11/28 15:45:24 INFO: 156 feasible tasks found for job 231:2 in partition DEFAULT (39 Needed)
11/28 15:45:24 INFO: 156 feasible tasks found for job 231:3 in partition DEFAULT (39 Needed)
11/28 15:45:24 INFO: 156 feasible tasks found for job 231:4 in partition DEFAULT (16 Needed)
Prior to the above entries there are a WHOLE BUNCH of entries similar to these shown below:
11/28 15:45:24 MUGetIndex(TJC,ValList,0)
11/28 15:45:24 MUGetIndex(TNJA,ValList,0)
11/28 15:45:24 MUGetIndex(TNJC,ValList,0)
11/28 15:45:24 MUGetIndex(TNXF,ValList,0)
11/28 15:45:24 MUGetIndex(TPSD,ValList,0)
11/28 15:45:24 MUGetIndex(TPSE,ValList,0)
11/28 15:45:24 MUGetIndex(TPSR,ValList,0)
11/28 15:45:24 MUGetIndex(TPSU,ValList,0)
11/28 15:45:24 MUGetIndex(TQM,ValList,0)
11/28 15:45:24 MUGetIndex(TQT,ValList,0)
11/28 15:45:24 MUGetIndex(TRT,ValList,0)
11/28 15:45:24 MUGetIndex(TXF,ValList,0)
There is only this one job in the queue on a 256 node cluster running torque 2.5.7 and maui 3.2.6p21
I have tried starting the maui process within strace but I do not see any smoking gun in that strace output.
I can probably get maui to start if I qdel the job but I was sort of hoping to see what was causing the problem in case any additional debugging output was needed.
--
Steven DuChene
More information about the mauiusers
mailing list