[Mauiusers] MSysRegEvent(JOBRESVIOLATION: job '850416' in state 'Running' has exceeded PROC resource limit (1618 > 100) (action CANCEL will be taken)
Michel Béland
michel.beland at rqchp.qc.ca
Tue Jan 5 09:32:41 MST 2010
Sabuj Pattanayek wrote:
> I've unset all the resources_*.nodes settings for the queue
Rightfully since min and max on these does not make any sense to PBS as
they are considered as text resources.
> and gotten
> maui/torque to actually read the ncpus= value from the PBS script file
> since this is actually what sets the PROC limit.
The ncpus limits is useful only for single-node jobs. If you ask
-lncpus=32 while your nodes only have 16 cores, your jobs will be
rejected for sure.
> According to the
> source src/moab/MJob.c line 11483, ncpus is tied to the PROC limit.
> Also the comment in src/moab/MPBSI.c on line 1676 validates this.
> However, for some reason ncpus was not being read previously from the
> pbs script file even when all references to resources_*.ncpus were
> unset in qmgr.
If Maui does not set the PROC limit without -lncpus, I consider that
this is a bug. The PROC column in the output of showq is ok, though,
with -lnodes=1:ppn=16.
But before giving up, make sure that you have "JOBNODEMATCHPOLICY
EXACTNODE" in maui.cfg.
> Now that I know how to get the PROC resource limit to change, I'll
> play around with the qmgr settings again to try to set limits before a
> user even submits a job. Then I'll test to see if these limits are
> enforced for MPI programs (across nodes).
I think that you will get nowhere with ncpus on a cluster, unless you
use these limits only for single-node jobs.
--
Michel Béland, analyste en calcul scientifique
michel.beland at rqchp.qc.ca
bureau S-250, pavillon Roger-Gaudry (principal), Université de Montréal
téléphone : 514 343-6111 poste 3892 télécopieur : 514 343-2155
RQCHP (Réseau québécois de calcul de haute performance) www.rqchp.qc.ca
More information about the mauiusers
mailing list