[Mauiusers] Understanding diagnose -r

Jan Ploski Jan.Ploski at offis.de
Thu Nov 22 05:13:19 MST 2007


mauiusers-bounces at supercluster.org schrieb am 11/11/2007 12:35:55 AM:

> Hello,
> 
> I'm trying to understand the output of diagnose -r. First, for every job 

> submitted by Globus, an entry like this one is presented:
> 
> 361802                      Job DEF   -00:32:41    10:47:19     11:20:00 

>     1    1    1
>      ACL: JOB==361802=
>      CL:  JOB==361802 USER==dgad0005 GROUP==ad CLASS==dgiseq QOS==dgiseq 

> DURATION==11:20:00 PROC==1
> WARNING:  reservation '361802' has 1 proc(s) allocated but 0 detected
>
> What does the last line mean?

Answering my own questions for posterity:

The line (as far as I could tell!) means that according to TORQUE (e.g., 
pbsnodes -a, qstat) one processor is being used by the job, but according 
to Maui 0 processors are being used.

> Moreover, I tried to create a reservation of a single machine, which is 
> reported as follows:
> 
> ib.2                       User DEF   -00:01:55     3:58:05      4:00:00 

>     1    1    8
>      Flags: PREEMPTEE
>      ACL: RES==ib= CLASS==ib+
>      CL:  RES==ib
>      Task Resources: PROCS: [ALL]
>      Attributes (HostList='node1'   MaxTasks=1)
>      Active PH: 0.11/0.30 (37.50%)
> WARNING:  reservation 'ib.2' has 8 proc(s) allocated but 5 detected
> 
> Again, what does the last line mean? This 8-processor machine is 
> currently occupied by 8 jobs (which were allocated to it before the 
> reservation). I cannot make any sense of the '5 detected' part.

Here, likewise, it means that the machine's 8 processors are in reality 
occupied by jobs, however Maui thinks that only 5 processors are occupied.

> I also don't understand "MaxTasks=1".

(This one is still unresolved.)

> The output of diagnose -r ends with
> 
> Active Reserved Processors: 164
> WARNING:  reservation table is corrupt:  active procs reserved does not 
> equal active procs detected (164 != 74)
> 
> Should I be concerned about this?

Yes.

The cause of the above phenomena was a misconfigured queue:

set queue dgiseq resources_max.ncpus = 0

The jobs submitted to this queue were (amazingly) executing. However, from 
Maui's viewpoint they were not consuming any processor resources. This had 
the nasty side effect of Maui assigning jobs to the already occupied 
nodes, even though other really free nodes were available. The wrongly 
assigned jobs would not run, and they would prevent other jobs with lower 
priority from running, too.

-JPL


More information about the mauiusers mailing list