[Mauiusers] Re: [OMPI users] [torqueusers] Job dies randomly,
but only through torque
Jan.Ploski at offis.de
Mon Jun 2 13:05:56 MDT 2008
Jim Kusznir wrote:
> I did turn off resource enforcement (cancel), and the jobs are running
> properly now.
> The numbers below on load are being multiplied by 100. I personally
> observed the "372" was a node load of 3.72 according to w/top/etc.
> What bothers me is that maui believes the job is only entitled to 100
> (1.00, or a single CPU). It definately scheduled the job on the
> requested 4 CPUs, and the job was submitted with both (on separate
> occasions) nodes=4:ppn=1 and nodes=1:ppn=4, both with identical
> I don't recall ever setting the "Resource_List.ncpus=1", and I didn't
> find that in maui.cfg; is there somewhere else I should be looking for
Also check TORQUE's server and queue configuration with qmgr.
More information about the mauiusers