[Mauiusers] RE: Problem with MAUI and partition(PartitionAccess)
Balle, Susanne
susanne.balle at hp.com
Tue Dec 14 08:49:26 MST 2004
I am not sure if this helps but I looked at Mpolicy.c with gdb and
The statement
If((P->Index > 1) && ((MUBMCheck(P->Index, J->PAL)==FALSE)))is true.
I printed out *P and *J below.
Anything looks wrong in *J and *P?
Thanks Susanne
(gdb) p *P
$11 = {Name = "DEFAULT", '\0' <repeats 57 times>, Index = 1, ConfigNodes
= 4, IdleNodes = 4,
ActiveNodes = 0, UpNodes = 4, CRes = {Procs = 10, Mem = 12756, Swap = 0,
Disk = 44716, PSlot = {{
count = 0} <repeats 16 times>}, GRes = {{count = 0}, {count = 0}, {count
= 0}, {count = 0}}},
URes = {Procs = 10, Mem = 12756, Swap = 0, Disk = 44716, PSlot = {{count
= 0} <repeats 16 times>},
GRes = {{count = 0}, {count = 0}, {count = 0}, {count = 0}}}, ARes =
{Procs = 10, Mem = 12756,
Swap = 0, Disk = 44716, PSlot = {{count = 0} <repeats 16 times>}, GRes =
{{count = 0}, {
count = 0}, {count = 0}, {count = 0}}}, DRes = {Procs = 0, Mem = 0, Swap
= 0, Disk = 0,
PSlot = {{count = 0} <repeats 16 times>}, GRes = {{count = 0}, {count =
0}, {count = 0}, {
count = 0}}}, S = {Count = 0, NCount = 0, JobCountSubmitted = 0,
JobCountSuccessful = 0,
QOSMet = 0, RejectionCount = 0, TotalQTS = 0, MaxQTS = 0, TotalQueuedPH
= 0,
TotalRequestTime = 0, TotalRunTime = 0, PSRequest = 0,
PSRequestSubmitted = 0, PSRun = 0,
PSRunSuccessful = 0, PSDedicated = 0, PSUtilized = 0, MSAvail = 0,
MSUtilized = 0,
MSDedicated = 0, PS2Dedicated = 0, PS2Utilized = 0, JobAcc = 0, NJobAcc
= 0, XFactor = 0,
NXFactor = 0, PSXFactor = 0, MaxXFactor = 0, Bypass = 0, MaxBypass = 0,
BFCount = 0, BFPSRun = 0,
Accuracy = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0}, DStat = {{Data = 0x0, DSize =
0, Count = 0, Size = 0},
{Data = 0x0, DSize = 0, Count = 0, Size = 0}, {Data = 0x0, DSize = 0,
Count = 0, Size = 0}, {
Data = 0x0, DSize = 0, Count = 0, Size = 0}, {Data = 0x0, DSize = 0,
Count = 0, Size = 0}, {
Data = 0x0, DSize = 0, Count = 0, Size = 0}, {Data = 0x0, DSize = 0,
Count = 0, Size = 0}, {
Data = 0x0, DSize = 0, Count = 0, Size = 0}}}, FSC = {PCW = {-1, -1,
-1, -1, -1, -1, -1, -1},
PCP = {0, 0, 0, 0, 0, 0, 0, 0}, PSW = {-1 <repeats 32 times>}, PSP = {0
<repeats 32 times>},
PCC = {0, 0, 0, 0, 0, 0, 0, 0}, PSC = {0 <repeats 32 times>},
XFMinWCLimit = 0, FSPolicy = 0,
FSInterval = 0, FSDepth = 0, FSDecay = 0}, L = {AP = {Usage = {{0, 0, 0,
0}, {0, 0, 0, 0}, {0, 0,
0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0,
0, 0}, {0, 0, 0, 0}},
SLimit = {{0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0,
0, 0}, {0, 0, 0, 0}, {
0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}}, HLimit = {{0, 0, 0, 0}, {0,
0, 0, 0}, {0, 0, 0,
0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0, 0}, {0, 0, 0,
0}, {0, 0, 0, 0}}},
IP = 0x2fdcfd0, JP = 0x2fdd190, OAP = 0x0, OIP = 0x0, OJP = 0x0, APU =
0x0, APC = 0x0, APG = 0x0,
APQ = 0x0, JDef = 0x0, JMax = 0x0, JMin = 0x0}, F = {JobFlags = 0,
Priority = 0,
APQ = 0x0, JDef = 0x0, JMax = 0x0, JMin = 0x0}, F = {JobFlags = 0,
Priority = 0,
IsLocalPriority = 0 '\0', Overrun = 0, ADef = 0x0, AAL = 0x0, QDef =
0x1120760, QAL = {1, 0, 0,
0, 0}, QALType = 0, PDef = 0xc8e790, PAL = {16777215}, PALType = 0,
FSTarget = 0, FSMode = 0,
FSFactor = 0, FSUsage = {0 <repeats 24 times>}, IsInitialized = 0},
BFPolicy = 1, BFDepth = 0,
BFMetric = 1, BFProcFactor = 0, BFMaxSchedules = 10000, BFPriorityPolicy
= 0, BFChunkSize = 0,
BFChunkDuration = 0, BFChunkBlockTime = 0, MaxMetaTasks = 0,
MaxJobStartTime = 2140000000,
NAllocPolicy = 1, DistPolicy = 1, JobSizePolicy = 0, JobNodeMatch = 0,
ResPolicy = 0,
ResRetryTime = 0, ResTType = 0, ResTValue = 0, ResDepth = {1, 0 <repeats
127 times>}, ResCount = {
0 <repeats 128 times>}, ResQOSList = {{0x80, 0x0 <repeats 127 times>}
<repeats 128 times>},
UseMachineSpeed = 0, UseSystemQueueTime = 1, UseCPUTime = 0,
RejectNegPrioJobs = 0,
EnableNegJobPriority = 0, EnableMultiNodeJobs = 1, EnableMultiReqJobs =
0,
JobPrioAccrualPolicy = 2, NAvailPolicy = "\001\000\000\000\000",
ResourceLimitPolicy = {mrlpNONE,
mrlpNONE, mrlpNONE, mrlpNONE, mrlpNONE, mrlpNONE},
ResourceLimitViolationAction = {mrlaRequeue,
mrlaRequeue, mrlaRequeue, mrlaRequeue, mrlaRequeue, mrlaRequeue},
ResourceLimitMaxViolationTime = {0, 0, 0, 0, 0, 0}, AdminMinSTime = 0,
UseLocalMachinePriority = 0,
NodeLoadPolicy = 1, NodeSetPolicy = 0, NodeSetAttribute = 0, NodeSetList
= {
0x0 <repeats 128 times>}, NodeSetDelay = 0, NodeSetTolerance = 0,
NodeSetPriorityType = 4,
UntrackedProcFactor = 1.2, NodeDownStateDelayTime = 3600, XRes = {{Name
= '\0' <repeats 63 times>,
Type = 0, Func = 0, Data = 0x0} <repeats 5120 times>}, Message = 0x0}
(gdb) p *J
$13 = {Name = "15", '\0' <repeats 62 times>, AName = 0x0, RMJID = 0x0,
Index = 1, CTime = 1103055570,
MTime = 1103055565, ATime = 1103055975, StateMTime = 0, SWallTime = 0,
AWallTime = 0, Ckpt = 0x0,
Next = 0x2fc4250, Prev = 0x2fc4250, C = {TotalProcCount = 2}, RCL = {{
Name = '\0' <repeats 63 times>, Value = 0, Flags = 0, Type = 0 '\0',
Cmp = 0 '\0',
Affinity = 0 '\0'} <repeats 32 times>}, Cred = {U = 0x2ff51e0, G =
0x11d7880, A = 0x0, C = 0x0,
Q = 0x1120760, CredType = 0, ACL = 0x2ffe3b0, CL = 0x3014f90, MTime =
0}, QReq = 0x0, R = 0x0,
ResName = '\0' <repeats 63 times>, RAList = {'\0' <repeats 63 times>
<repeats 16 times>},
RM = 0x13e02e0, SessionID = 0, NeedNodes = 0x0, SIData = 0x0, SOData =
0x0, Req = {0x3002ee0, 0x0,
0x0, 0x0, 0x0}, GReq = 0x0, MReq = 0x0, E = {IWD = 0x0, Cmd = 0x0,
Input = 0x0, Output = 0x0,
Error = 0x0, Args = 0x0, Env = 0x0, Shell = 0x0, JobType = 0x0, PID =
0, SID = 0}, NodeCount = 0,
TaskCount = 0, TasksRequested = 0, NodesRequested = 0, Alloc = {TC = 0,
NC = 0}, Request = {TC = 2,
NC = 1}, Geometry = 0x0, SpecDistPolicy = 0 '\0', DistPolicy = 0 '\0',
SystemQueueTime = 1103055565, EffQueueDuration = 410, SMinTime = 0,
SpecSMinTime = 0,
SysSMinTime = 0, CMaxTime = 2140000000, TermTime = 2140000000, RMinTime
= 0,
SubmitTime = 1103055565, StartTime = 0, DispatchTime = 0,
CompletionTime = 0, LastNotifyTime = 0,
CompletionCode = 0, StartCount = 0, DeferCount = 0, PreemptCount = 0,
Bypass = 0, HoldReason = 0,
BlockReason = 0, SyncDeadLine = 2140000000, Message = 0x0, WCLimit =
1800, CPULimit = 0,
SpecWCLimit = {1800, 1800, 0 <repeats 31 times>}, RemainingTime = 0,
SimWCTime = 0,
MinMachineSpeed = 0, SpecPAL = {8}, PAL = {8}, ImageSize = 0, ExecSize
= 0, State = mjsIdle,
EState = mjsIdle, IState = mjsNONE, Hold = 0, SuspendType = 0,
SystemPrio = 0, UPriority = 0,
StartPriority = 6, RunPriority = 0, TaskMap = {-1, 0, 0, 0, 0, 0, 0, 0,
9, 0, 0, 0, 0, 0, 0, 0,
269, 0 <repeats 11 times>, 12, 0, 0, 0, 21568, 137, 0, 0, -12096,
-27305, 42, 0, 19200, 767, 0,
0, 0, 0, 0, 0, -20424, -16404, 127, 0, 1046, -160, -1, -1, 19154,
16831, 0, 0, -29465, 0, 0, 0,
19154, 16831, 0, 0, -25411, -27256, 42, 0, -15216, -16404, 127, 0,
-26295, 73, 0, 0, 0, 0, 1,
0 <repeats 33 times>, 19154, 16831, 0, 0, 19245, 767, 0 <repeats 2086
times>, 14132, 13667,
26164, 13874, 14640, 25656, 12852, 25905, 0 <repeats 452 times>,
-12096, -27305, 42, 0, 0, 0, 0,
0, -13904, -16404, 127, 0, -15232, -16404, 127, 0, 1, 0, 0, 0, -13632,
-16404, 127, 0, 7923, 76,
0, 0, 0, 0, 0, 0, -13616, -16404, 127, 0, 32, 0, 0, 0, -16614, -27260,
42, 0, 48, 0, 0, 0, 0, 0,
0, 0, 14132, 13667, 26164, 13874, 20792, -27258, 42, 0, 21504, 15699,
12593, 13104, 16400, 764,
0, 0, 32, 0, 0, 0, -16384, -27306...}, NodeList = 0x2fed5b0, ReqHList =
0x0, ExcHList = 0x0,
ReqHLMode = 0, RType = 0, Cluster = 0, Proc = 0, SubmitHost = '\0'
<repeats 63 times>, Flags = 0,
SpecFlags = 0, SysFlags = 0, AttrBM = 0, MasterJobName = 0x0,
MasterHostName = 0x0, PSUtilized = 0,
PSDedicated = 0, MSUtilized = 0, MSDedicated = 0, RMXString = 0x0,
RMSubmitString = 0x0,
RMSubmitType = 0, SystemID = 0x0, SystemJID = 0x0, Depend = 0x0,
RULVTime = 0, FeatureMap = {0},
ASFunc = 0, ASData = 0x0, xd = 0x0}
-----Original Message-----
From: mauiusers-bounces at supercluster.org
[mailto:mauiusers-bounces at supercluster.org] On Behalf Of Balle, Susanne
Sent: Monday, December 13, 2004 6:06 AM
To: Dave Jackson; mauiusers at supercluster.org
Subject: RE: [Mauiusers] RE: Problem with MAUI and
partition(PartitionAccess)
Dave,
Enclosed find the maui.cfg file.
Thanks for helping me out,
Susanne
-----Original Message-----
From: Dave Jackson [mailto:jacksond at clusterresources.com]
Sent: Friday, December 10, 2004 4:01 PM
To: Balle, Susanne; mauiusers at supercluster.org
Subject: Re: [Mauiusers] RE: Problem with MAUI and
partition(PartitionAccess)
Susanne,
The partition access failure inidcates that the credentials associated
with the job do not have access to the resources inside the partition.
This access should be enabled by default and should only be denied if an
explicit configuration limiting partition access is specified in the
config file. Can you send us your maui.cfg file?
Dave
>>> "Balle, Susanne" <susanne.balle at hp.com> 12/09/04 11:46 AM >>>
My apologies if you get this twice, Susanne
-----Original Message-----
From: Balle, Susanne
Sent: Thursday, December 09, 2004 1:21 PM
To: 'mauiusers at supercluster.org'
Cc: Balle, Susanne
Subject: Problem with MAUI and partition (PartitionAccess)
Hi
I am trying to use Maui and SLURM.
I have Maui and SLURM running and they seem to exchange some
information.
When using MAUI as the scheduler, jobs are not started. Jobs are
detected but never started. I am running the following job: "srun -n 2
-t 20 ./slurm.sh"
where slurm.sh is
#!/bin/sh
`which hostname`
>From the output of checkjob I get the follow:
cannot select job 19 for partition DEFAULT (PartitionAccess)
I have enclosed some info below: Output from checkjob 19, output from
diagnose -t, tail -110 maui.log as well as details about how I built
MAUI and integrated it with SLURM In the output below job 18 and job 19
are the same job. I just got terminated job 18 before I had all the
output I needed for this email.
Thanks for any help,
Regards
Susanne
---------------------------------------------------------------
Susanne M. Balle, PhD
Hewlett-Packard
MS ZKO02-3/Q08
110 Spit Brook Road
Nashua, NH 03062
Phone: 603-884-7732
Fax: 603-884-0630
Susanne.Balle at hp.com
------------------------------------------------------------------------
-----------
checking job 19
State: Idle
Creds: user:root group:root qos:DEFAULT
WallTime: 00:00:00 of 00:20:00
SubmitTime: Thu Dec 9 18:09:30
(Time Queued Total: 00:00:06 Eligible: 00:00:06)
Total Tasks: 1
Req[0] TaskCount: 1 Partition: ALL
Network: [NONE] Memory >= 1M Disk >= 1M Swap >= 0
Opsys: [NONE] Arch: [NONE] Features: [NONE]
NodeCount: 1
IWD: [NONE] Executable: [NONE]
Bypass: 0 StartCount: 0
PartitionMask: [lsf]
PE: 1.00 StartPriority: 1
cannot select job 19 for partition DEFAULT (PartitionAccess)
[root at xc14n16 log]# diagnose -t
Displaying Partition Status
System Partition Settings: PList: DEFAULT PDef: DEFAULT
Name Procs
DEFAULT 10
Partition Configured Up U/C Dedicated D/U Active
A/U
NODE--------------------------------------------------------------------
--------
DEFAULT 4 4 100.00% 0 0.00% 0
0.00%
PROC--------------------------------------------------------------------
--------
DEFAULT 10 10 100.00% 0 0.00% 0
0.00%
MEM---------------------------------------------------------------------
-------
DEFAULT 12756 12756 100.00% 0 0.00% 0
0.00%
DISK--------------------------------------------------------------------
--------
DEFAULT 44716 44716 100.00% 0 0.00% 0
0.00%
Class/Queue State
[<CLASS> <AVAIL>:<UP>]...
DEFAULT [NONE]
tail -110 maui.log gives me the following.
12/09 17:55:57 INFO: starting iteration 229
12/09 17:55:57 MRMGetInfo()
12/09 17:55:57 MClusterClearUsage()
12/09 17:55:57 MRMClusterQuery()
12/09 17:55:57 MWikiClusterLoadInfo(XC14N16,RCount,EMsg,SC)
12/09 17:55:57 MWikiDoCommand(XC14N16,7321,9000000,CHECKSUM,CMD=GETNODES
ARG=0:ALL,Data,DataSize,SC) 12/09 17:55:57
MSUSendData(S,9000000,TRUE,FALSE)
12/09 17:55:57 INFO: packet sent (78 bytes of 78)
12/09 17:55:57 INFO: command sent to server
12/09 17:55:57 INFO: message sent: 'CMD=GETNODES ARG=0:ALL'
12/09 17:55:57 MSURecvData(S,9000000,1)
12/09 17:55:57 MSURecvPacket(8,Buffer,9,NULL,9000000)
12/09 17:55:57 MSURecvPacket(8,Buffer,269,NULL,9000000)
12/09 17:55:57 MSUDisconnect(S)
12/09 17:55:57 INFO: received node list through WIKI RM
12/09 17:55:57 INFO: loading 4 node(s)
12/09 17:55:57 MWikiNodeUpdate(AList,xc14n13)
12/09 17:55:57 MWikiNodeUpdate(AList,xc14n14)
12/09 17:55:57 MWikiNodeUpdate(AList,xc14n15)
12/09 17:55:57 MWikiNodeUpdate(AList,xc14n16)
12/09 17:55:57 INFO: 0 WIKI resources detected on RM XC14N16
12/09 17:55:57 WARNING: no resources detected
12/09 17:55:57 MRMWorkloadQuery()
12/09 17:55:57 MWikiWorkloadQuery(XC14N16,JCount,SC)
12/09 17:55:57 MWikiDoCommand(XC14N16,7321,9000000,CHECKSUM,CMD=GETJOBS
ARG=0:ALL,Data,DataSize,SC) 12/09 17:55:57
MSUSendData(S,9000000,TRUE,FALSE)
12/09 17:55:57 INFO: packet sent (77 bytes of 77)
12/09 17:55:57 INFO: command sent to server
12/09 17:55:57 INFO: message sent: 'CMD=GETJOBS ARG=0:ALL'
12/09 17:55:57 MSURecvData(S,9000000,1)
12/09 17:55:57 MSURecvPacket(8,Buffer,9,NULL,9000000)
12/09 17:55:57 MSURecvPacket(8,Buffer,200,NULL,9000000)
12/09 17:55:57 MSUDisconnect(S)
12/09 17:55:57 INFO: received job list through WIKI RM
12/09 17:55:57 INFO: loading 1 job(s)
12/09 17:55:57 MWikiUpdateJob(AList,18,0)
12/09 17:55:57 MUGetIndex(UPDATETIME=1102632406,ValList,0)
12/09 17:55:57 MUGetIndex(STATE=Idle,ValList,0)
12/09 17:55:57 MUGetIndex(WCLIMIT=1200,ValList,0)
12/09 17:55:57 MUGetIndex(TASKS=1,ValList,0)
12/09 17:55:57 MUGetIndex(QUEUETIME=1102632406,ValList,0)
12/09 17:55:57 MUGetIndex(UNAME=root,ValList,0)
12/09 17:55:57 MUGetIndex(GNAME=root,ValList,0)
12/09 17:55:57 MUGetIndex(PARTITIONMASK=lsf,ValList,0)
12/09 17:55:57 MUGetIndex(NODES=1,ValList,0)
12/09 17:55:57 MUGetIndex(RMEM=1,ValList,0)
12/09 17:55:57 MUGetIndex(RDISK=1,ValList,0)
12/09 17:55:57 INFO: 1 WIKI jobs detected on RM XC14N16
12/09 17:55:57 INFO: jobs detected: 1
12/09 17:55:57 MStatClearUsage(node,Active)
12/09 17:55:57 MClusterUpdateNodeState()
12/09 17:55:57 MQueueSelectAllJobs(Q,HARD,ALL,JIList,DP,Msg)
12/09 17:55:57 INFO: job '18' Priority: 9
12/09 17:55:57 INFO: Cred: 0(00.0) FS: 0(00.0) Attr:
0(00.0) Serv: 9(00.0)
Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
12/09 17:55:57 MStatClearUsage([NONE],Active)
12/09 17:55:57 INFO: total jobs selected (ALL): 1/1
12/09 17:55:57 MQueueSelectAllJobs(Q,SOFT,ALL,JIList,DP,Msg)
12/09 17:55:57 INFO: job '18' Priority: 9
12/09 17:55:57 INFO: Cred: 0(00.0) FS: 0(00.0) Attr:
0(00.0) Serv: 9(00.0)
Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
12/09 17:55:57 MStatClearUsage([NONE],Idle)
12/09 17:55:57 INFO: total jobs selected (ALL): 1/1
12/09 17:55:57
MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,EVERY,FReason,FALSE
)
12/09 17:55:57 INFO: total jobs selected in partition ALL: 1/1
12/09 17:55:57 MQueueScheduleRJobs(Q)
12/09 17:55:57
MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE)
12/09 17:55:57 INFO: total jobs selected in partition ALL: 1/1
12/09 17:55:57
MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,DEFAULT,FReason,TRU
E)
12/09 17:55:57 INFO: total jobs selected in partition DEFAULT: 0/1
[PartitionAccess: 1]
12/09 17:55:57
MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,EVERY,FReason,TRUE)
12/09 17:55:57 INFO: total jobs selected in partition ALL: 1/1
12/09 17:55:57 MQueueScheduleRJobs(Q)
12/09 17:55:57
MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE)
12/09 17:55:57 INFO: total jobs selected in partition ALL: 1/1
12/09 17:55:57
MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,DEFAULT,FReason,TRU
E)
12/09 17:55:57 INFO: total jobs selected in partition DEFAULT: 0/1
[PartitionAccess: 1]
12/09 17:55:57
MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,EVERY,FReason,TRUE)
12/09 17:55:57 INFO: total jobs selected in partition ALL: 1/1
12/09 17:55:57
MQueueSelectJobs(SrcQ,DstQ,HARD,5120,4096,2140000000,DEFAULT,FReason,TRU
E)
12/09 17:55:57 INFO: total jobs selected in partition DEFAULT: 0/1
[PartitionAccess: 1]
12/09 17:55:57
MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE)
12/09 17:55:57 INFO: total jobs selected in partition ALL: 1/1
12/09 17:55:57 INFO: job '18' Priority: 9
12/09 17:55:57 INFO: Cred: 0(00.0) FS: 0(00.0) Attr:
0(00.0) Serv: 9(00.0)
Targ: 0(00.0) Res: 0(00.0) Us: 0(00.0)
12/09 17:55:57 MSchedUpdateStats()
12/09 17:55:57 INFO: iteration: 229 scheduling time: 0.002
seconds
12/09 17:55:57 MResUpdateStats()
12/09 17:55:57 INFO: current util[229]: 0/4 (0.00%) PH: 0.00%
active jobs: 0 of 2 (completed: 0)
12/09 17:55:57 MQueueCheckStatus()
12/09 17:55:57 MNodeCheckStatus()
12/09 17:55:57 MUClearChild(PID)
12/09 17:55:57 INFO: scheduling complete. sleeping 20 seconds
[root at xc14n16 log]#
Thanks for any help,
Regards
Susanne
------------------------------------------------------------------------
-------
For details about how I built MAUI and integrated it with SLURM see
section below.
I downloaded the MAUI kit: maui-3.2.6p9 from the MAUI website and
compiled MAUI from its source distribution. I tried to follow the steps
located at http://www.llnl.gov/linux/slurm/maui.html
The configuration step didn't ask me if I want to build MAUI with PBS
and didn't ask me for a checksum seed either as it is documented in the
SLURM integration document.
Reading further down in the SLURM integration instruction I noticed that
SLURM will be using the Wiki interface to MAUI.
>From the doc it looks like my configure line should look something
like:
./configure --with-key=42 --with-wiki
Completed as expected
gmake
Completed as expected
Next I update the MAUI configuration file: maui.cfg with the following
info:
# Resource Manager Definition
RMCFG[XC14N16] TYPE=WIKI
RMPORT 7321
RMHOST XC14N16
RMPOLLINTERVAL 00:00:20
In /hptc_cluster/slurm/etc/slurm.conf
uncommented the following lines:
SchedulerType=sched/wiki
SchedulerAuth=42
SchedulerPort=7321
I started maui and slurm and some of commands work.
[root at xc14n16 log]# showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING
STARTTIME
0 Active Jobs 0 of 10 Processors Active (0.00%)
0 of 4 Nodes Active (0.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT
QUEUETIME
18 root Idle 1 00:20:00 Thu Dec 9
17:46:46
1 Idle Job
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT
QUEUETIME
Total Jobs: 1 Active Jobs: 0 Idle Jobs: 1 Blocked Jobs: 0
[root at xc14n16 log]#
_______________________________________________
mauiusers mailing list
mauiusers at supercluster.org
http://supercluster.org/mailman/listinfo/mauiusers
More information about the mauiusers
mailing list