[Mauiusers] MSysRegEvent(JOBRESVIOLATION: job '850416' in state 'Running' has exceeded PROC resource limit (1618 > 100) (action CANCEL will be taken)

Michel Béland michel.beland at rqchp.qc.ca
Tue Jan 5 09:32:41 MST 2010


Sabuj Pattanayek wrote:

> I've unset all the resources_*.nodes settings for the queue

Rightfully since min and max on these does not make any sense to PBS as 
they are considered as text resources.

> and gotten
> maui/torque to actually read the ncpus= value from the PBS script file
> since this is actually what sets the PROC limit.

The ncpus limits is useful only for single-node jobs. If you ask 
-lncpus=32 while your nodes only have 16 cores, your jobs will be 
rejected for sure.

> According to the
> source src/moab/MJob.c line 11483, ncpus is tied to the PROC limit.
> Also the comment in src/moab/MPBSI.c on line 1676 validates this.
> However, for some reason ncpus was not being read previously from the
> pbs script file even when all references to resources_*.ncpus were
> unset in qmgr.

If Maui does not set the PROC limit without -lncpus, I consider that 
this is a bug. The PROC column in the output of showq is ok, though, 
with -lnodes=1:ppn=16.

But before giving up, make sure that you have "JOBNODEMATCHPOLICY 
EXACTNODE" in maui.cfg.

> Now that I know how to get the PROC resource limit to change, I'll
> play around with the qmgr settings again to try to set limits before a
> user even submits a job. Then I'll test to see if these limits are
> enforced for MPI programs (across nodes).

I think that you will get nowhere with ncpus on a cluster, unless you 
use these limits only for single-node jobs.

-- 
Michel Béland, analyste en calcul scientifique
michel.beland at rqchp.qc.ca
bureau S-250, pavillon Roger-Gaudry (principal), Université de Montréal
téléphone : 514 343-6111 poste 3892     télécopieur : 514 343-2155
RQCHP (Réseau québécois de calcul de haute performance)  www.rqchp.qc.ca


More information about the mauiusers mailing list