[Mauiusers] Re: [OMPI users] [torqueusers] Job dies randomly, but only through torque

Jan Ploski Jan.Ploski at offis.de
Mon Jun 2 13:05:56 MDT 2008


Jim Kusznir wrote:
> I did turn off resource enforcement (cancel), and the jobs are running
> properly now.
> 
> The numbers below on load are being multiplied by 100.  I personally
> observed the "372" was a node load of 3.72 according to w/top/etc.
> What bothers me is that maui believes the job is only entitled to 100
> (1.00, or a single CPU).  It definately scheduled the job on the
> requested 4 CPUs, and the job was submitted with both (on separate
> occasions) nodes=4:ppn=1 and nodes=1:ppn=4, both with identical
> results.
> 
> I don't recall ever setting the "Resource_List.ncpus=1", and I didn't
> find that in maui.cfg; is there somewhere else I should be looking for
> that?

Also check TORQUE's server and queue configuration with qmgr.

Regards,
Jan Ploski


More information about the mauiusers mailing list