[torqueusers] Exceeded job limits on nodes

Garrick Staples garrick at usc.edu
Wed Mar 29 17:26:15 MST 2006


On Tue, Mar 28, 2006 at 03:32:34PM +0200, Piotr Siwczak alleged:
> 
> Hi,
> 
> I am running torque + maui on an Opteron cluster. A strange thing has been 
> happening recently with 3 of our nodes. All of them have:
> 
> 
> Node w38
>         state = free
>         np = 6
>         properties = lcgpro
>         ntype = cluster
>         jobs = 0/2408.fangorn.man.poznan.pl, 1/2409.fangorn.man.poznan.pl,
>                2/2410.fangorn.man.poznan.pl, 3/2626.fangorn.man.poznan.pl,
>                3/2625.fangorn.man.poznan.pl, 3/2624.fangorn.man.poznan.pl,
>                3/2623.fangorn.man.poznan.pl, 3/2622.fangorn.man.poznan.pl,
>                3/2621.fangorn.man.poznan.pl, 3/2620.fangorn.man.poznan.pl,
>                3/2619.fangorn.man.poznan.pl, 3/2618.fangorn.man.poznan.pl,
>                3/2617.fangorn.man.poznan.pl, 3/2616.fangorn.man.poznan.pl,
>                3/2615.fangorn.man.poznan.pl, 3/2614.fangorn.man.poznan.pl,
>                3/2613.fangorn.man.poznan.pl, 3/2612.fangorn.man.poznan.pl,
>                3/2611.fangorn.man.poznan.pl, 3/2610.fangorn.man.poznan.pl,
>                3/2609.fangorn.man.poznan.pl, 3/2608.fangorn.man.poznan.pl,
>                3/2607.fangorn.man.poznan.pl, 3/2606.fangorn.man.poznan.pl,
>                3/2605.fangorn.man.poznan.pl, 3/2604.fangorn.man.poznan.pl,
>                3/2603.fangorn.man.poznan.pl, 3/2602.fangorn.man.poznan.pl,
>                3/2601.fangorn.man.poznan.pl, 3/2600.fangorn.man.poznan.pl,
>                3/2599.fangorn.man.poznan.pl, 3/2598.fangorn.man.poznan.pl,
>                3/2597.fangorn.man.poznan.pl, 3/2596.fangorn.man.poznan.pl,
>                3/2595.fangorn.man.poznan.pl, 3/2594.fangorn.man.poznan.pl,
>                3/2593.fangorn.man.poznan.pl, 3/2592.fangorn.man.poznan.pl,
>                3/2591.fangorn.man.poznan.pl, 3/2590.fangorn.man.poznan.pl,
>                3/2589.fangorn.man.poznan.pl, 3/2588.fangorn.man.poznan.pl
> 
> 
> 
> As you probably see from the above excerpt, the number of jobs far exceeds 
> the number of slots. Further more, the node is still  shown as "free". Has 
> anyone got any idea what's going on here?

What state are the jobs in?  Are you perhaps using preemption?  What
version of TORQUE and have you tried with the latest stable release?

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20060329/6677c565/attachment.bin


More information about the torqueusers mailing list