Itay M wrote:
> Here is the diagnose -j on these two jobs that are running on node28:
(looking as expected, 1 Proc, 1 class initializer from heavy consumed 
for each job)

> And here is the checkjob -v on these two jobs:
(also nothing strange in there)

> what does the 0:4 means?

It means that according to Maui 0 out of 4 processors are available...

> Could this be related to the way in which the user is running the job 
> itself (the one that qsub runs) ?
> Or should I check something in the nodes? something related to load 
> average? else?

...It is very possible that it is related to the load average. Are these 
2 jobs multithreaded? Is the load ~4 while it should be ~2? I think 
maybe this message explains what is happening in your cluster: 

See also the description of NODEAVAILABILITYPOLICY in the Maui manual, 
especially the default setting which says:

"The node is considered busy if either dedicated or utilized resources 
equal or exceed configured resources"

So maybe Maui is simply not starting jobs because it correctly detects 
that the processors are overcommitted. Perhaps your 'heavy' job users 
should require nodes=1:ppn=2 if they are indeed causing load+=2 each.

> BTW, almost all of our jobs have the 'WARNING:  job '{job_id}' utilizes 
> more memory than dedicated (xxxx > 512)  . Should I change the default 
> memory assigned for the jobs? Currently the default is 512MB.

Well, apparently your 'heavy' jobs are consuming much more memory than 
that. I think this may also be a reason why the new jobs are not getting 
started - if they are requesting 512 MB, but there is not enough free 
memory left. If you increase this requirement, Maui is likely to become 
even more conservative about starting jobs. But this may be a sensible 
thing to do - you don't want your jobs to overcommit memory and the node 
to start swapping (check vmstat output on the node to see if it is not 
swapping/paging already).

Jan Ploski

