[Mauiusers] [torqueusers] Increased np but no change in number of jobs running on them

skip at pobox.com skip at pobox.com
Sun Mar 27 12:35:01 MDT 2011


    >> I have a couple four-core desktop machines which I normally each
    >> define to only have two processors, giving me enough breathing room
    >> to do interactive work on them.  I bumped both of them up to four
    >> processors a few minutes ago using qmgr.  I notice that the
    >> .../server_priv/nodes file still says they still only have two
    >> processors.  (I've already opened a bug report about this.)  How long
    >> should it be before Torque or Maui notices the extra available
    >> processors and puts them to use?

    Mgr> What version are you using? 2.4 works just fine.

I'm using 2.4.8.  Thanks for the feedback.

    >> As a corollary, should it be necessary to pick up this change, is it
    >> safe to restart the pbs_server and maui processes while there are
    >> jobs running on the execution nodes?

    Mgr> Not sure about maui, but pbs_server can be safely restarted using
    Mgr> qterm -t quick.

Thanks, but that didn't really work.  It killed the pbs_server process, and
I was able to restart it, but even though I am using the 75 cores I have
available (still missing those couple extra cores I tried to allocate
earlier), it shows me some nodes have free processors when they clearly
don't:

    % qnodes -l all
    udesktop267.wacker   free
    userver133.wacker    free
    userver211.wacker    free
    userver209.wacker    job-exclusive
    us306.wacker         job-exclusive
    us254.wacker         job-exclusive
    userver121.wacker    job-exclusive
    us323.wacker         job-exclusive
    udesktop264.wacker   free
    % sudo cat /var/spool/torque/server_priv/nodes
    udesktop267.wacker np=4
    userver133.wacker np=4
    userver211.wacker np=8
    userver209.wacker np=8
    us306.wacker np=12
    us254.wacker np=24
    userver121.wacker np=4
    us323.wacker np=12
    udesktop264.wacker np=4
    % qwhere
       2 udesktop264
       1 udesktop267
      24 us254
      12 us306
      12 us323
       4 userver121
       4 userver133
       8 userver209
       8 userver211

The qwhere command is a simple bash function:

    qwhere () 
    { 
        qstat -1n | egrep ' R ' | awk '{print $12}' | sort | uniq -c
    }

Does the state of a node possibly not get updated until a job completes on
it?

S


More information about the mauiusers mailing list