[Mauiusers] insufficient idle procs available ?

Itay M itaym.tau at gmail.com
Tue Jan 29 14:00:31 MST 2008


Here is the diagnose -j on these two jobs that are running on node28:
/==============================/
diagnose -j 228620
Name                  State Par Proc QOS     WCLimit R  Min     User
Group  Account  QueuedTime  Network  Opsys   Arch    Mem   Disk  Procs
Class Features

228620              Running DEF    1 low 10:00:00:00 1    1    ad_user
pu_group        -     2:49:41   [NONE] [NONE] [NONE]    >=0    >=0    NC0
[heavy:1] [NONE]
WARNING:  job '228620' utilizes more memory than dedicated (3432 > 512)

diagnose -j 228621
Name                  State Par Proc QOS     WCLimit R  Min     User
Group  Account  QueuedTime  Network  Opsys   Arch    Mem   Disk  Procs
Class Features

228621              Running DEF    1 low 10:00:00:00 1    1    ad_user
pu_group       -     2:49:41   [NONE] [NONE] [NONE]    >=0    >=0    NC0
[heavy:1] [NONE]
WARNING:  job '228621' utilizes more memory than dedicated (3595 > 512)
/==============================/

And here is the checkjob -v on these two jobs:

/==============================/

checking job 228620 (RM job '228620.cluster')
State: Running
Creds:  user:ad_user  group:pu_group  class:heavy  qos:low
WallTime: 6:31:31 of 10:00:00:00
SubmitTime: Tue Jan 29 16:14:14
  (Time Queued  Total: 00:00:01  Eligible: 00:00:01)
StartTime: Tue Jan 29 16:14:15
Total Tasks: 1
Req[0]  TaskCount: 1  Partition: DEFAULT
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
Exec:  ''  ExecSize: 0  ImageSize: 0
Dedicated Resources Per Task: PROCS: 1  MEM: 512M
Utilized Resources Per Task:  PROCS: 0.13  MEM: 34.32  SWAP: 35.44
Avg Util Resources Per Task:  PROCS: 0.10
Max Util Resources Per Task:  PROCS: 0.13  MEM: 34.32  SWAP: 35.44
Average Utilized Memory: 3408.54 MB
Average Utilized Procs: 0.61
NodeAccess: SHARED
NodeCount: 1
Allocated Nodes:
[node28:1]
Task Distribution: node28

IWD: [NONE]  Executable:  [NONE]
Bypass: 0  StartCount: 1
PartitionMask: [ALL]
SystemQueueTime: Tue Jan 29 19:53:18
Flags:       RESTARTABLE
Reservation '228620' (-6:31:19 -> 9:17:28:41  Duration: 10:00:00:00)
PE:  1.00  StartPriority:  200


checking job 228621 (RM job '228621.cluster')

State: Running
Creds:  user:ad_user  group:pu_group  class:heavy  qos:low
WallTime: 6:24:00 of 10:00:00:00
SubmitTime: Tue Jan 29 16:22:46
  (Time Queued  Total: 00:00:01  Eligible: 00:00:01)

StartTime: Tue Jan 29 16:22:47
Total Tasks: 1
Req[0]  TaskCount: 1  Partition: DEFAULT
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
Exec:  ''  ExecSize: 0  ImageSize: 0
Dedicated Resources Per Task: PROCS: 1  MEM: 512M
Utilized Resources Per Task:  PROCS: 0.10  MEM: 35.95  SWAP: 39.56
Avg Util Resources Per Task:  PROCS: 0.08
Max Util Resources Per Task:  PROCS: 0.10  MEM: 35.95  SWAP: 39.56
Average Utilized Memory: 3561.67 MB
Average Utilized Procs: 0.58
NodeAccess: SHARED
NodeCount: 1
Allocated Nodes:
[node28:1]
Task Distribution: node28

IWD: [NONE]  Executable:  [NONE]
Bypass: 0  StartCount: 1
PartitionMask: [ALL]
SystemQueueTime: Tue Jan 29 19:53:18

Flags:       RESTARTABLE

Reservation '228621' (-6:23:49 -> 9:17:36:11  Duration: 10:00:00:00)
PE:  1.00  StartPriority:  200


/==============================/

what does the 0:4 means?
Could this be related to the way in which the user is running the job itself
(the one that qsub runs) ?
Or should I check something in the nodes? something related to load average?
else?
BTW, almost all of our jobs have the 'WARNING:  job '{job_id}' utilizes more
memory than dedicated (xxxx > 512)  . Should I change the default memory
assigned for the jobs? Currently the default is 512MB.

On Jan 29, 2008 10:36 PM, Jan Ploski <Jan.Ploski at offis.de> wrote:

>
>
>
> Can you also report the output of checkjob and diagnose -j on these 2
> jobs? Do they also have the MEM requirement?
>
> > About the MEM requirement: do you mean to unset it to? other than that
> > we don't use any MEM requierment in our qsub script.
>
> Well, it must be coming from somewhere, quite possibly from a default in
> the queue or server configuration. So I'd try unsetting it there.
> However, looking at the diagnose -n output above makes me think it is
> processor related - judging from the 0:4, for some unknown reason your
> jobs consume 2 processors each rather than 1.
>
> Regards,
> Jan Ploski
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20080129/e714918b/attachment.html


More information about the mauiusers mailing list