[Mauiusers] RE: Problem with MAUI and partition(PartitionAccess)

Balle, Susanne susanne.balle at hp.com
Tue Dec 14 08:49:26 MST 2004


I am not sure if this helps but I looked at Mpolicy.c with gdb and 

The statement 
If((P->Index > 1) && ((MUBMCheck(P->Index, J->PAL)==FALSE)))is true.

I printed out *P and *J below.

Anything looks wrong in *J and *P?

Thanks Susanne

(gdb) p *P
$11 = {Name = "DEFAULT", '\0' <repeats 57 times>, Index = 1, ConfigNodes
= 4, IdleNodes = 4,
ActiveNodes = 0, UpNodes = 4, CRes = {Procs = 10, Mem = 12756, Swap = 0,
Disk = 44716, PSlot = {{
count = 0} <repeats 16 times>}, GRes = {{count = 0}, {count = 0}, {count
= 0}, {count = 0}}},
URes = {Procs = 10, Mem = 12756, Swap = 0, Disk = 44716, PSlot = {{count
= 0} <repeats 16 times>},
GRes = {{count = 0}, {count = 0}, {count = 0}, {count = 0}}}, ARes =
{Procs = 10, Mem = 12756,
Swap = 0, Disk = 44716, PSlot = {{count = 0} <repeats 16 times>}, GRes =
{{count = 0}, {
count = 0}, {count = 0}, {count = 0}}}, DRes = {Procs = 0, Mem = 0, Swap
= 0, Disk = 0,
PSlot = {{count = 0} <repeats 16 times>}, GRes = {{count = 0}, {count =
0}, {count = 0}, {
count = 0}}}, S = {Count = 0, NCount = 0, JobCountSubmitted = 0,
JobCountSuccessful = 0,
QOSMet = 0, RejectionCount = 0, TotalQTS = 0, MaxQTS = 0, TotalQueuedPH
= 0,
TotalRequestTime = 0, TotalRunTime = 0, PSRequest = 0,
PSRequestSubmitted = 0, PSRun = 0,
PSRunSuccessful = 0, PSDedicated = 0, PSUtilized = 0, MSAvail = 0,
MSUtilized = 0,
MSDedicated = 0, PS2Dedicated = 0, PS2Utilized = 0, JobAcc = 0, NJobAcc
= 0, XFactor = 0,
NXFactor = 0, PSXFactor = 0, MaxXFactor = 0, Bypass = 0, MaxBypass = 0,
BFCount = 0, BFPSRun = 0,
Accuracy = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, DStat = {{Data = 0x0, DSize =
0, Count = 0, Size = 0},
{Data = 0x0, DSize = 0, Count = 0, Size = 0}, {Data = 0x0, DSize = 0,
Count = 0, Size = 0}, {
 Data = 0x0, DSize = 0, Count = 0, Size = 0}, {Data = 0x0, DSize = 0,
Count = 0, Size = 0}, {
 Data = 0x0, DSize = 0, Count = 0, Size = 0}, {Data = 0x0, DSize = 0,
Count = 0, Size = 0}, {
 Data = 0x0, DSize = 0, Count = 0, Size = 0}}}, FSC = {PCW = {-1, -1,
-1, -1, -1, -1, -1, -1},
PCP = {0, 0, 0, 0, 0, 0, 0, 0}, PSW = {-1 <repeats 32 times>}, PSP = {0
<repeats 32 times>},
PCC = {0, 0, 0, 0, 0, 0, 0, 0}, PSC = {0 <repeats 32 times>},
XFMinWCLimit = 0, FSPolicy = 0,
FSInterval = 0, FSDepth = 0, FSDecay = 0}, L = {AP = {Usage = {{0, 0, 0,
0}, {0, 0, 0, 0}, {0, 0,
 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0,
0, 0}, {0, 0, 0, 0}},
SLimit = {{0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0,
0, 0}, {0, 0, 0, 0}, {
   0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}}, HLimit = {{0, 0, 0, 0}, {0,
0, 0, 0}, {0, 0, 0,
   0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0,
0}, {0, 0, 0, 0}}},
IP = 0x2fdcfd0, JP = 0x2fdd190, OAP = 0x0, OIP = 0x0, OJP = 0x0, APU =
0x0, APC = 0x0, APG = 0x0,
APQ = 0x0, JDef = 0x0, JMax = 0x0, JMin = 0x0}, F = {JobFlags = 0,
Priority = 0,
APQ = 0x0, JDef = 0x0, JMax = 0x0, JMin = 0x0}, F = {JobFlags = 0,
Priority = 0,
IsLocalPriority = 0 '\0', Overrun = 0, ADef = 0x0, AAL = 0x0, QDef =
0x1120760, QAL = {1, 0, 0,
   0, 0}, QALType = 0, PDef = 0xc8e790, PAL = {16777215}, PALType = 0,
FSTarget = 0, FSMode = 0,
FSFactor = 0, FSUsage = {0 <repeats 24 times>}, IsInitialized = 0},
BFPolicy = 1, BFDepth = 0,
BFMetric = 1, BFProcFactor = 0, BFMaxSchedules = 10000, BFPriorityPolicy
= 0, BFChunkSize = 0,
BFChunkDuration = 0, BFChunkBlockTime = 0, MaxMetaTasks = 0,
MaxJobStartTime = 2140000000,
NAllocPolicy = 1, DistPolicy = 1, JobSizePolicy = 0, JobNodeMatch = 0,
ResPolicy = 0,
ResRetryTime = 0, ResTType = 0, ResTValue = 0, ResDepth = {1, 0 <repeats
127 times>}, ResCount = {
  0 <repeats 128 times>}, ResQOSList = {{0x80, 0x0 <repeats 127 times>}
<repeats 128 times>},
UseMachineSpeed = 0, UseSystemQueueTime = 1, UseCPUTime = 0,
RejectNegPrioJobs = 0,
EnableNegJobPriority = 0, EnableMultiNodeJobs = 1, EnableMultiReqJobs =
0,
JobPrioAccrualPolicy = 2, NAvailPolicy = "\001\000\000\000\000",
ResourceLimitPolicy = {mrlpNONE,
  mrlpNONE, mrlpNONE, mrlpNONE, mrlpNONE, mrlpNONE},
ResourceLimitViolationAction = {mrlaRequeue,
  mrlaRequeue, mrlaRequeue, mrlaRequeue, mrlaRequeue, mrlaRequeue},
ResourceLimitMaxViolationTime = {0, 0, 0, 0, 0, 0}, AdminMinSTime = 0,
UseLocalMachinePriority = 0,
NodeLoadPolicy = 1, NodeSetPolicy = 0, NodeSetAttribute = 0, NodeSetList
= {
  0x0 <repeats 128 times>}, NodeSetDelay = 0, NodeSetTolerance = 0,
NodeSetPriorityType = 4,
UntrackedProcFactor = 1.2, NodeDownStateDelayTime = 3600, XRes = {{Name
= '\0' <repeats 63 times>,
   Type = 0, Func = 0, Data = 0x0} <repeats 5120 times>}, Message = 0x0}

(gdb) p *J
$13 = {Name = "15", '\0' <repeats 62 times>, AName = 0x0, RMJID = 0x0,
Index = 1, CTime = 1103055570,
 MTime = 1103055565, ATime = 1103055975, StateMTime = 0, SWallTime = 0,
AWallTime = 0, Ckpt = 0x0,
 Next = 0x2fc4250, Prev = 0x2fc4250, C = {TotalProcCount = 2}, RCL = {{
 Name = '\0' <repeats 63 times>, Value = 0, Flags = 0, Type = 0 '\0',
Cmp = 0 '\0',
 Affinity = 0 '\0'} <repeats 32 times>}, Cred = {U = 0x2ff51e0, G =
0x11d7880, A = 0x0, C = 0x0,
 Q = 0x1120760, CredType = 0, ACL = 0x2ffe3b0, CL = 0x3014f90, MTime =
0}, QReq = 0x0, R = 0x0,
 ResName = '\0' <repeats 63 times>, RAList = {'\0' <repeats 63 times>
<repeats 16 times>},
 RM = 0x13e02e0, SessionID = 0, NeedNodes = 0x0, SIData = 0x0, SOData =
0x0, Req = {0x3002ee0, 0x0,
  0x0, 0x0, 0x0}, GReq = 0x0, MReq = 0x0, E = {IWD = 0x0, Cmd = 0x0,
Input = 0x0, Output = 0x0,
  Error = 0x0, Args = 0x0, Env = 0x0, Shell = 0x0, JobType = 0x0, PID =
0, SID = 0}, NodeCount = 0,
 TaskCount = 0, TasksRequested = 0, NodesRequested = 0, Alloc = {TC = 0,
NC = 0}, Request = {TC = 2,
  NC = 1}, Geometry = 0x0, SpecDistPolicy = 0 '\0', DistPolicy = 0 '\0',
 SystemQueueTime = 1103055565, EffQueueDuration = 410, SMinTime = 0,
SpecSMinTime = 0,
 SysSMinTime = 0, CMaxTime = 2140000000, TermTime = 2140000000, RMinTime
= 0,
 SubmitTime = 1103055565, StartTime = 0, DispatchTime = 0,
CompletionTime = 0, LastNotifyTime = 0,
 CompletionCode = 0, StartCount = 0, DeferCount = 0, PreemptCount = 0,
Bypass = 0, HoldReason = 0,
 BlockReason = 0, SyncDeadLine = 2140000000, Message = 0x0, WCLimit =
1800, CPULimit = 0,
 SpecWCLimit = {1800, 1800, 0 <repeats 31 times>}, RemainingTime = 0,
SimWCTime = 0,
 MinMachineSpeed = 0, SpecPAL = {8}, PAL = {8}, ImageSize = 0, ExecSize
= 0, State = mjsIdle,
 EState = mjsIdle, IState = mjsNONE, Hold = 0, SuspendType = 0,
SystemPrio = 0, UPriority = 0,
 StartPriority = 6, RunPriority = 0, TaskMap = {-1, 0, 0, 0, 0, 0, 0, 0,
9, 0, 0, 0, 0, 0, 0, 0,
 269, 0 <repeats 11 times>, 12, 0, 0, 0, 21568, 137, 0, 0, -12096,
-27305, 42, 0, 19200, 767, 0,
 0, 0, 0, 0, 0, -20424, -16404, 127, 0, 1046, -160, -1, -1, 19154,
16831, 0, 0, -29465, 0, 0, 0,
 19154, 16831, 0, 0, -25411, -27256, 42, 0, -15216, -16404, 127, 0,
-26295, 73, 0, 0, 0, 0, 1,
 0 <repeats 33 times>, 19154, 16831, 0, 0, 19245, 767, 0 <repeats 2086
times>, 14132, 13667,
 26164, 13874, 14640, 25656, 12852, 25905, 0 <repeats 452 times>,
-12096, -27305, 42, 0, 0, 0, 0,
 0, -13904, -16404, 127, 0, -15232, -16404, 127, 0, 1, 0, 0, 0, -13632,
-16404, 127, 0, 7923, 76,
 0, 0, 0, 0, 0, 0, -13616, -16404, 127, 0, 32, 0, 0, 0, -16614, -27260,
42, 0, 48, 0, 0, 0, 0, 0,
 0, 0, 14132, 13667, 26164, 13874, 20792, -27258, 42, 0, 21504, 15699,
12593, 13104, 16400, 764,
 0, 0, 32, 0, 0, 0, -16384, -27306...}, NodeList = 0x2fed5b0, ReqHList =
0x0, ExcHList = 0x0,
 ReqHLMode = 0, RType = 0, Cluster = 0, Proc = 0, SubmitHost = '\0'
<repeats 63 times>, Flags = 0,
 SpecFlags = 0, SysFlags = 0, AttrBM = 0, MasterJobName = 0x0,
MasterHostName = 0x0, PSUtilized = 0,
 PSDedicated = 0, MSUtilized = 0, MSDedicated = 0, RMXString = 0x0,
RMSubmitString = 0x0,
 RMSubmitType = 0, SystemID = 0x0, SystemJID = 0x0, Depend = 0x0,
RULVTime = 0, FeatureMap = {0},
 ASFunc = 0, ASData = 0x0, xd = 0x0}

-----Original Message-----
From: mauiusers-bounces at supercluster.org
[mailto:mauiusers-bounces at supercluster.org] On Behalf Of Balle, Susanne
Sent: Monday, December 13, 2004 6:06 AM
To: Dave Jackson; mauiusers at supercluster.org
Subject: RE: [Mauiusers] RE: Problem with MAUI and
partition(PartitionAccess)


Dave,

Enclosed find the maui.cfg file.

Thanks for helping me out,

Susanne

-----Original Message-----
From: Dave Jackson [mailto:jacksond at clusterresources.com] 
Sent: Friday, December 10, 2004 4:01 PM
To: Balle, Susanne; mauiusers at supercluster.org
Subject: Re: [Mauiusers] RE: Problem with MAUI and
partition(PartitionAccess)


Susanne,

  The partition access failure inidcates that the credentials associated
with the job do not have access to the resources inside the partition. 
This access should be enabled by default and should only be denied if an
explicit configuration limiting partition access is specified in the
config file.  Can you send us your maui.cfg file?

Dave

>>> "Balle, Susanne" <susanne.balle at hp.com> 12/09/04 11:46 AM >>>

My apologies if you get this twice, Susanne

-----Original Message-----
From: Balle, Susanne 
Sent: Thursday, December 09, 2004 1:21 PM
To: 'mauiusers at supercluster.org'
Cc: Balle, Susanne
Subject: Problem with MAUI and partition (PartitionAccess)



Hi

I am trying to use Maui and SLURM.

I have Maui and SLURM running and they seem to exchange some
information.

When using MAUI as the scheduler, jobs are not started. Jobs are
detected but never started. I am running the following job: "srun -n 2
-t 20 ./slurm.sh" 
where slurm.sh is 
#!/bin/sh
`which hostname`

>From the output of checkjob I get the follow:
cannot select job 19 for partition DEFAULT (PartitionAccess)

I have enclosed some info below: Output from checkjob 19, output from
diagnose -t, tail -110 maui.log as well as details about how I built
MAUI and integrated it with SLURM In the output below job 18 and job 19
are the same job. I just got terminated job 18 before I had all the
output I needed for this email.

Thanks for any help,

Regards

Susanne

---------------------------------------------------------------
Susanne M. Balle, PhD
Hewlett-Packard
MS ZKO02-3/Q08
110 Spit Brook Road
Nashua, NH 03062

Phone: 603-884-7732
Fax:     603-884-0630

Susanne.Balle at hp.com

------------------------------------------------------------------------
-----------

checking job 19

State: Idle
Creds:  user:root  group:root  qos:DEFAULT
WallTime: 00:00:00 of 00:20:00
SubmitTime: Thu Dec  9 18:09:30
  (Time Queued  Total: 00:00:06  Eligible: 00:00:06)

Total Tasks: 1

Req[0]  TaskCount: 1  Partition: ALL
Network: [NONE]  Memory >= 1M  Disk >= 1M  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
NodeCount: 1

IWD: [NONE]  Executable:  [NONE]
Bypass: 0  StartCount: 0
PartitionMask: [lsf]
PE:  1.00  StartPriority:  1

cannot select job 19 for partition DEFAULT (PartitionAccess)

[root at xc14n16 log]# diagnose -t
Displaying Partition Status

System Partition Settings:  PList: DEFAULT PDef: DEFAULT

Name                    Procs

DEFAULT                    10

Partition    Configured         Up     U/C  Dedicated     D/U     Active
A/U

NODE--------------------------------------------------------------------
--------
DEFAULT               4          4 100.00%          0   0.00%          0
0.00%
PROC--------------------------------------------------------------------
--------
DEFAULT              10         10 100.00%          0   0.00%          0
0.00%
MEM---------------------------------------------------------------------
-------
DEFAULT           12756      12756 100.00%          0   0.00%          0
0.00%
DISK--------------------------------------------------------------------
--------
DEFAULT           44716      44716 100.00%          0   0.00%          0
0.00%

Class/Queue State

             [<CLASS> <AVAIL>:<UP>]...

     DEFAULT [NONE]

tail -110 maui.log gives me the following.

12/09 17:55:57 INFO:     starting iteration 229
12/09 17:55:57 MRMGetInfo()
12/09 17:55:57 MClusterClearUsage()
12/09 17:55:57 MRMClusterQuery()
12/09 17:55:57 MWikiClusterLoadInfo(XC14N16,RCount,EMsg,SC)
12/09 17:55:57 MWikiDoCommand(XC14N16,7321,9000000,CHECKSUM,CMD=GETNODES
ARG=0:ALL,Data,DataSize,SC) 12/09 17:55:57
MSUSendData(S,9000000,TRUE,FALSE)
12/09 17:55:57 INFO:     packet sent (78 bytes of 78)
12/09 17:55:57 INFO:     command sent to server
12/09 17:55:57 INFO:     message sent: 'CMD=GETNODES ARG=0:ALL'
12/09 17:55:57 MSURecvData(S,9000000,1)
12/09 17:55:57 MSURecvPacket(8,Buffer,9,NULL,9000000)
12/09 17:55:57 MSURecvPacket(8,Buffer,269,NULL,9000000)
12/09 17:55:57 MSUDisconnect(S)
12/09 17:55:57 INFO:     received node list through WIKI RM
12/09 17:55:57 INFO:     loading 4 node(s)
12/09 17:55:57 MWikiNodeUpdate(AList,xc14n13)
12/09 17:55:57 MWikiNodeUpdate(AList,xc14n14)
12/09 17:55:57 MWikiNodeUpdate(AList,xc14n15)
12/09 17:55:57 MWikiNodeUpdate(AList,xc14n16)
12/09 17:55:57 INFO:     0 WIKI resources detected on RM XC14N16
12/09 17:55:57 WARNING:  no resources detected
12/09 17:55:57 MRMWorkloadQuery()
12/09 17:55:57 MWikiWorkloadQuery(XC14N16,JCount,SC)
12/09 17:55:57 MWikiDoCommand(XC14N16,7321,9000000,CHECKSUM,CMD=GETJOBS
ARG=0:ALL,Data,DataSize,SC) 12/09 17:55:57
MSUSendData(S,9000000,TRUE,FALSE)
12/09 17:55:57 INFO:     packet sent (77 bytes of 77)
12/09 17:55:57 INFO:     command sent to server
12/09 17:55:57 INFO:     message sent: 'CMD=GETJOBS ARG=0:ALL'
12/09 17:55:57 MSURecvData(S,9000000,1)
12/09 17:55:57 MSURecvPacket(8,Buffer,9,NULL,9000000)
12/09 17:55:57 MSURecvPacket(8,Buffer,200,NULL,9000000)
12/09 17:55:57 MSUDisconnect(S)
12/09 17:55:57 INFO:     received job list through WIKI RM
12/09 17:55:57 INFO:     loading 1 job(s)
12/09 17:55:57 MWikiUpdateJob(AList,18,0)
12/09 17:55:57 MUGetIndex(UPDATETIME=1102632406,ValList,0)
12/09 17:55:57 MUGetIndex(STATE=Idle,ValList,0)
12/09 17:55:57 MUGetIndex(WCLIMIT=1200,ValList,0)
12/09 17:55:57 MUGetIndex(TASKS=1,ValList,0)
12/09 17:55:57 MUGetIndex(QUEUETIME=1102632406,ValList,0)
12/09 17:55:57 MUGetIndex(UNAME=root,ValList,0)
12/09 17:55:57 MUGetIndex(GNAME=root,ValList,0)
12/09 17:55:57 MUGetIndex(PARTITIONMASK=lsf,ValList,0)
12/09 17:55:57 MUGetIndex(NODES=1,ValList,0)
12/09 17:55:57 MUGetIndex(RMEM=1,ValList,0)
12/09 17:55:57 MUGetIndex(RDISK=1,ValList,0)
12/09 17:55:57 INFO:     1 WIKI jobs detected on RM XC14N16
12/09 17:55:57 INFO:     jobs detected: 1
12/09 17:55:57 MStatClearUsage(node,Active)
12/09 17:55:57 MClusterUpdateNodeState()
12/09 17:55:57 MQueueSelectAllJobs(Q,HARD,ALL,JIList,DP,Msg)
12/09 17:55:57 INFO:     job '18' Priority:        9
12/09 17:55:57 INFO:     Cred:      0(00.0)  FS:      0(00.0)  Attr:
0(00.0)  Serv:      9(00.0)
Targ:      0(00.0)  Res:      0(00.0)  Us:      0(00.0)
12/09 17:55:57 MStatClearUsage([NONE],Active)
12/09 17:55:57 INFO:     total jobs selected (ALL): 1/1
12/09 17:55:57 MQueueSelectAllJobs(Q,SOFT,ALL,JIList,DP,Msg)
12/09 17:55:57 INFO:     job '18' Priority:        9
12/09 17:55:57 INFO:     Cred:      0(00.0)  FS:      0(00.0)  Attr:
0(00.0)  Serv:      9(00.0)
Targ:      0(00.0)  Res:      0(00.0)  Us:      0(00.0)
12/09 17:55:57 MStatClearUsage([NONE],Idle)
12/09 17:55:57 INFO:     total jobs selected (ALL): 1/1
12/09 17:55:57
MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,EVERY,FReason,FALSE
)
12/09 17:55:57 INFO:     total jobs selected in partition ALL: 1/1
12/09 17:55:57 MQueueScheduleRJobs(Q)
12/09 17:55:57
MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE)
12/09 17:55:57 INFO:     total jobs selected in partition ALL: 1/1
12/09 17:55:57
MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,DEFAULT,FReason,TRU
E)
12/09 17:55:57 INFO:     total jobs selected in partition DEFAULT: 0/1
[PartitionAccess: 1]
12/09 17:55:57
MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,EVERY,FReason,TRUE)
12/09 17:55:57 INFO:     total jobs selected in partition ALL: 1/1
12/09 17:55:57 MQueueScheduleRJobs(Q)
12/09 17:55:57
MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE)
12/09 17:55:57 INFO:     total jobs selected in partition ALL: 1/1
12/09 17:55:57
MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,DEFAULT,FReason,TRU
E)
12/09 17:55:57 INFO:     total jobs selected in partition DEFAULT: 0/1
[PartitionAccess: 1]
12/09 17:55:57
MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,EVERY,FReason,TRUE)
12/09 17:55:57 INFO:     total jobs selected in partition ALL: 1/1
12/09 17:55:57
MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,DEFAULT,FReason,TRU
E)
12/09 17:55:57 INFO:     total jobs selected in partition DEFAULT: 0/1
[PartitionAccess: 1]
12/09 17:55:57
MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE)
12/09 17:55:57 INFO:     total jobs selected in partition ALL: 1/1
12/09 17:55:57 INFO:     job '18' Priority:        9
12/09 17:55:57 INFO:     Cred:      0(00.0)  FS:      0(00.0)  Attr:
0(00.0)  Serv:      9(00.0)
Targ:      0(00.0)  Res:      0(00.0)  Us:      0(00.0)
12/09 17:55:57 MSchedUpdateStats()
12/09 17:55:57 INFO:     iteration:  229   scheduling time:  0.002
seconds
12/09 17:55:57 MResUpdateStats()
12/09 17:55:57 INFO:     current util[229]:  0/4 (0.00%)  PH: 0.00%
active jobs: 0 of 2 (completed: 0)
12/09 17:55:57 MQueueCheckStatus()
12/09 17:55:57 MNodeCheckStatus()
12/09 17:55:57 MUClearChild(PID)
12/09 17:55:57 INFO:     scheduling complete.  sleeping 20 seconds
[root at xc14n16 log]#

Thanks for any help,

Regards 

Susanne 

------------------------------------------------------------------------
-------

For details about how I built MAUI and integrated it with SLURM see
section below.

I downloaded the MAUI kit: maui-3.2.6p9 from the MAUI website and
compiled MAUI from its source distribution. I tried to follow the steps
located at http://www.llnl.gov/linux/slurm/maui.html

The configuration step didn't ask me if I want to build MAUI with PBS
and didn't ask me for a checksum seed either as it is documented in the
SLURM integration document.

Reading further down in the SLURM integration instruction I noticed that
SLURM will be using the Wiki interface to MAUI.

>From the doc it looks like my configure line should look something
like:

./configure --with-key=42 --with-wiki

Completed as expected

gmake

Completed as expected

Next I update the MAUI configuration file: maui.cfg with the following
info:

# Resource Manager Definition

RMCFG[XC14N16] TYPE=WIKI
RMPORT          7321
RMHOST          XC14N16
RMPOLLINTERVAL  00:00:20

In /hptc_cluster/slurm/etc/slurm.conf 

uncommented the following lines:
SchedulerType=sched/wiki
SchedulerAuth=42
SchedulerPort=7321

I started maui and slurm and some of commands work.

[root at xc14n16 log]# showq
ACTIVE JOBS--------------------
JOBNAME            USERNAME      STATE  PROC   REMAINING
STARTTIME


     0 Active Jobs       0 of   10 Processors Active (0.00%)
                         0 of    4 Nodes Active      (0.00%)

IDLE JOBS----------------------
JOBNAME            USERNAME      STATE  PROC     WCLIMIT
QUEUETIME

18                     root       Idle     1    00:20:00  Thu Dec  9
17:46:46

1 Idle Job

BLOCKED JOBS----------------
JOBNAME            USERNAME      STATE  PROC     WCLIMIT
QUEUETIME


Total Jobs: 1   Active Jobs: 0   Idle Jobs: 1   Blocked Jobs: 0
[root at xc14n16 log]#


_______________________________________________
mauiusers mailing list
mauiusers at supercluster.org
http://supercluster.org/mailman/listinfo/mauiusers




More information about the mauiusers mailing list