[Mauiusers] Re: [OMPI users] [torqueusers] Job dies randomly, but only through torque

Jim Kusznir jkusznir at gmail.com
Mon Jun 2 10:13:05 MDT 2008

I did turn off resource enforcement (cancel), and the jobs are running
properly now.

The numbers below on load are being multiplied by 100.  I personally
observed the "372" was a node load of 3.72 according to w/top/etc.
What bothers me is that maui believes the job is only entitled to 100
(1.00, or a single CPU).  It definately scheduled the job on the
requested 4 CPUs, and the job was submitted with both (on separate
occasions) nodes=4:ppn=1 and nodes=1:ppn=4, both with identical

I don't recall ever setting the "Resource_List.ncpus=1", and I didn't
find that in maui.cfg; is there somewhere else I should be looking for

Thanks everyone for your help!


On Thu, May 29, 2008 at 12:08 PM, Jan Ploski <Jan.Ploski at offis.de> wrote:
> Jim Kusznir wrote:
>> I have verified that maui is killing the job.  I actually ran into
>> this with another user all of a sudden.  I don't know why its only
>> effecting a few currently.  Here's the maui log extract for a current
>> run of this users' program:
> ...
>> maui.log:05/29 09:27:21 INFO:     job 2120 exceeds requested proc
>> limit (3.72 > 1.00)
>> maui.log:05/29 09:27:21 MSysRegEvent(JOBRESVIOLATION:  job '2120' in
>> state 'Running' has exceeded PROC resource limit (372 > 100) (action
>> CANCEL will be taken)  job start time: Thu May 29 09:26:19
> ...
> Here is a little theory that I think fits your present observations:
> 1. You have "Resource_List.ncpus = 1" (I think this is what Maui calls the
> PROC resource limit.)
> 2. You also have the Maui configuration parameter RESOURCELIMITPOLICY set to
> 3. The job's executable starts multiple threads or subprocesses (perhaps
> instead of distributing them to all the remaining nodes?)
> Therefore Maui shoots it down, having noticed that it uses more than it is
> supposed to. It would help towards a solution if you could verify whether
> these points are true (1 => qstat -f, 2 => view config, 3 => use top or ps).
> Regards,
> Jan Ploski

