[Mauiusers] Understanding diagnose -r
Jan.Ploski at offis.de
Thu Nov 22 05:13:19 MST 2007
mauiusers-bounces at supercluster.org schrieb am 11/11/2007 12:35:55 AM:
> I'm trying to understand the output of diagnose -r. First, for every job
> submitted by Globus, an entry like this one is presented:
> 361802 Job DEF -00:32:41 10:47:19 11:20:00
> 1 1 1
> ACL: JOB==361802=
> CL: JOB==361802 USER==dgad0005 GROUP==ad CLASS==dgiseq QOS==dgiseq
> DURATION==11:20:00 PROC==1
> WARNING: reservation '361802' has 1 proc(s) allocated but 0 detected
> What does the last line mean?
Answering my own questions for posterity:
The line (as far as I could tell!) means that according to TORQUE (e.g.,
pbsnodes -a, qstat) one processor is being used by the job, but according
to Maui 0 processors are being used.
> Moreover, I tried to create a reservation of a single machine, which is
> reported as follows:
> ib.2 User DEF -00:01:55 3:58:05 4:00:00
> 1 1 8
> Flags: PREEMPTEE
> ACL: RES==ib= CLASS==ib+
> CL: RES==ib
> Task Resources: PROCS: [ALL]
> Attributes (HostList='node1' MaxTasks=1)
> Active PH: 0.11/0.30 (37.50%)
> WARNING: reservation 'ib.2' has 8 proc(s) allocated but 5 detected
> Again, what does the last line mean? This 8-processor machine is
> currently occupied by 8 jobs (which were allocated to it before the
> reservation). I cannot make any sense of the '5 detected' part.
Here, likewise, it means that the machine's 8 processors are in reality
occupied by jobs, however Maui thinks that only 5 processors are occupied.
> I also don't understand "MaxTasks=1".
(This one is still unresolved.)
> The output of diagnose -r ends with
> Active Reserved Processors: 164
> WARNING: reservation table is corrupt: active procs reserved does not
> equal active procs detected (164 != 74)
> Should I be concerned about this?
The cause of the above phenomena was a misconfigured queue:
set queue dgiseq resources_max.ncpus = 0
The jobs submitted to this queue were (amazingly) executing. However, from
Maui's viewpoint they were not consuming any processor resources. This had
the nasty side effect of Maui assigning jobs to the already occupied
nodes, even though other really free nodes were available. The wrongly
assigned jobs would not run, and they would prevent other jobs with lower
priority from running, too.
More information about the mauiusers