[torqueusers] Exceeded job limits on nodes

Piotr Siwczak psiwczak at man.poznan.pl
Tue Mar 28 06:32:34 MST 2006


Hi,

I am running torque + maui on an Opteron cluster. A strange thing has been 
happening recently with 3 of our nodes. All of them have:


Node w38
         state = free
         np = 6
         properties = lcgpro
         ntype = cluster
         jobs = 0/2408.fangorn.man.poznan.pl, 1/2409.fangorn.man.poznan.pl,
                2/2410.fangorn.man.poznan.pl, 3/2626.fangorn.man.poznan.pl,
                3/2625.fangorn.man.poznan.pl, 3/2624.fangorn.man.poznan.pl,
                3/2623.fangorn.man.poznan.pl, 3/2622.fangorn.man.poznan.pl,
                3/2621.fangorn.man.poznan.pl, 3/2620.fangorn.man.poznan.pl,
                3/2619.fangorn.man.poznan.pl, 3/2618.fangorn.man.poznan.pl,
                3/2617.fangorn.man.poznan.pl, 3/2616.fangorn.man.poznan.pl,
                3/2615.fangorn.man.poznan.pl, 3/2614.fangorn.man.poznan.pl,
                3/2613.fangorn.man.poznan.pl, 3/2612.fangorn.man.poznan.pl,
                3/2611.fangorn.man.poznan.pl, 3/2610.fangorn.man.poznan.pl,
                3/2609.fangorn.man.poznan.pl, 3/2608.fangorn.man.poznan.pl,
                3/2607.fangorn.man.poznan.pl, 3/2606.fangorn.man.poznan.pl,
                3/2605.fangorn.man.poznan.pl, 3/2604.fangorn.man.poznan.pl,
                3/2603.fangorn.man.poznan.pl, 3/2602.fangorn.man.poznan.pl,
                3/2601.fangorn.man.poznan.pl, 3/2600.fangorn.man.poznan.pl,
                3/2599.fangorn.man.poznan.pl, 3/2598.fangorn.man.poznan.pl,
                3/2597.fangorn.man.poznan.pl, 3/2596.fangorn.man.poznan.pl,
                3/2595.fangorn.man.poznan.pl, 3/2594.fangorn.man.poznan.pl,
                3/2593.fangorn.man.poznan.pl, 3/2592.fangorn.man.poznan.pl,
                3/2591.fangorn.man.poznan.pl, 3/2590.fangorn.man.poznan.pl,
                3/2589.fangorn.man.poznan.pl, 3/2588.fangorn.man.poznan.pl



As you probably see from the above excerpt, the number of jobs far exceeds 
the number of slots. Further more, the node is still  shown as "free". Has 
anyone got any idea what's going on here?

Piotr


  --
  Piotr Siwczak <psiwczak at man.poznan.pl>
  System Administrator

  Poznan Supercomputing and Networking Center
  Supercomputing Department

  (www.eu-egee.org <piotr.siwczak at cern.ch>)
  --


More information about the torqueusers mailing list