[torqueusers] [Fwd: priority queue/suspend jobs]

Jerry Mersel jerry.mersel at weizmann.ac.il
Tue Apr 15 03:05:52 MDT 2008


No one seems to be answering. Please help me with this.
I am stumped and have a deadline chiseled in cement.

Here is my origional email:

Thanks,
  Jerry

Any clue at all would be appreciated.


Hi:



  I am have set up 2 queues. A "normal" queue and a high

  priority queue.

  Of the 3 machines I am experimenting on all 3 can receive

  jobs from the normal queue and 2 can receive jobs from the

  high priority queue. If there are not enough free cpus a

  job from the normal queue should be suspended.  Everything 

  works fine and dandy when I'm working with 1 node, but when I 

  get into multiple nodes:ppn things don't work so well.



  For example (workq is normal q, prio.q is high priority)

  The high priority nodes have the property Jerry.

 

  pbsnodes give:



  node1

     state = free

     np = 2

     properties = Jerry

     ntype = cluster

     status = opsys=linux,uname=Linux node1 2.6.9-55.ELsmp #1 SMP Fri Apr 20 16:36:54 EDT 2007 x86_64,sessions=6714           6734,nsessions=2,nusers=1,idletime=286409,totmem=5767200kb,availmem=5638060kb,physmem=3735592kb,ncpus=2,loadave=0.00          ,netload=1216146266,state=free,jobs=? 0,rectime=1208111732



node3

     state = free

     np = 4

     ntype = cluster

     jobs = 2/144.node4

     status = opsys=linux,uname=Linux node3 2.6.9-55.ELsmp #1 SMP Fri Apr 20 16:36:54 EDT 2007 x86_64,sessions=3756           5071 27814 27834 27854 27874 28339,nsessions=7,nusers=3,idletime=289593,totmem=5825352kb,availmem=5645936kb,physmem=          12182352kb,ncpus=4,loadave=5.00,netload=694744571,state=free,jobs=144.node4,rectime=1208111732



node4

     state = free

     np = 2

     properties = Jerry

     ntype = cluster

     jobs = 0/169.node4

     status = opsys=linux,uname=Linux node4 2.6.9-55.ELsmp #1 SMP Fri Apr 20 16:36:54 EDT 2007 x86_64,sessions=498 2          269 3785 4900 29262 29285,nsessions=6,nusers=3,idletime=36361,totmem=5767048kb,availmem=4722596kb,physmem=3735440kb,     ncpus=2,loadave=1.00,netload=2458734191,state=free,jobs=169.node4,rectime=1208111729



When I give this command:



qsub -q prio.q -l nodes=2:ppn=2 ./t.sh



I expect the 1 job on node4 to get suspended so the high priority job can run on node1, and node4 using 2 cpus on eaach

machine but instead the new job just sits on the queue.



If 2 jobs were on node4 it would work fine.



Here is my maui configuration file:

#

# MAUI configuration example

# @(#)maui.cfg David Groep 20031015.1

# for MAUI version 3.2.5

#

SERVERHOST              node4

ADMIN1                  root

ADMINHOST               node4

#JOBNODEMATCHPOLICY      EXACTNODE

PREEMPTPOLICY SUSPEND

#RESERVATIONPOLICY    NEVER

ENABLEMULTINODEJOBS  TRUE



#

RMTYPE[0]           PBS

RMHOST[0]           node4

RMSERVER[0]         node4



SERVERPORT            40559

SERVERMODE            NORMAL



# Set PBS server polling interval. Since we have many short jobs

# and want fast turn-around, set this to 10 seconds (default: 2 minutes)

RMPOLLINTERVAL        00:00:10



# a max. 10 MByte log file in a logical location

LOGFILE               /var/log/maui.log

LOGFILEMAXSIZE        10000000

LOGLEVEL              3



#NODECFG[node4]   PARTITION=Jerry

#NODECFG[node1]   PARTITION=Jerry



CLASSCFG[DEFAULT]  QDEF=low

CLASSCFG[prio.q]   QDEF=high

QOSCFG[high]  PRIORITY=50000 QFLAGS=PREEMPTOR

QOSCFG[DEFAULT] QFLAGS=PREEMPTEE

QOSWEIGHT     1





I appreciate any advice anyone can give.





                            Thanks,

                              Jerry






More information about the torqueusers mailing list