[Mauiusers] MSysRegEvent(JOBRESVIOLATION: job '850416' in state 'Running' has exceeded PROC resource limit (1618 > 100) (action CANCEL will be taken)
sabujp at gmail.com
Wed Dec 30 15:39:38 MST 2009
On Tue, Dec 29, 2009 at 4:16 PM, <Gareth.Williams at csiro.au> wrote:
> unset resources_max.nodes and resources_min.nodes
> These are text field and min/max only makes sense if they are the same - and then not much sense.
> while you're at it, set resources_default.nodes to just 1 - as ppn=1 is the default anyway
> However, I thought ncpus was the best bet, given the PROC=1 limit reported in the log.
I've unset all the resources_*.nodes settings for the queue and gotten
maui/torque to actually read the ncpus= value from the PBS script file
since this is actually what sets the PROC limit. According to the
source src/moab/MJob.c line 11483, ncpus is tied to the PROC limit.
Also the comment in src/moab/MPBSI.c on line 1676 validates this.
However, for some reason ncpus was not being read previously from the
pbs script file even when all references to resources_*.ncpus were
unset in qmgr.
Now that I know how to get the PROC resource limit to change, I'll
play around with the qmgr settings again to try to set limits before a
user even submits a job. Then I'll test to see if these limits are
enforced for MPI programs (across nodes).
More information about the mauiusers