[torqueusers] One queue (fast) works - one queue doesn't (medium)

Thomas H Dr Pierce TPierce at rohmhaas.com
Wed Apr 30 14:41:51 MDT 2008


Dear Schedulers,
I cannot tell if this is a Torque or a Maui issue. 

Basically one queue runs jobs (the fast queue)  and the other does not 
(the medium queue). It seems like it wants a SWAP of 16 GB. A setting I 
cannot overwrite. 

Why would one queue work and the other not?

Thanks for your help.

 Qstat:
Job id              Name             User            Time Use S Queue
------------------- ---------------- --------------- -------- - -----
392.ralphie         base_83          xxxxxx                 0 Q medium
395.ralphie         Ba               yyyyyy          05:18:33 R fast
396.ralphie         Ba               yyyyyy          05:18:16 R fast
397.ralphie         Ba               yyyyyy          03:27:32 R fast
402.ralphie         cfd              zzzzzz          00:25:23 R fast
404.ralphie         test_8300        wwwwww                 0 Q medium


 checkjob 404


checking job 404

State: Idle  EState: Deferred
Creds:  user:rs0thp  group:users  class:medium  qos:DEFAULT
WallTime: 00:00:00 of 99:23:59:59
SubmitTime: Wed Apr 30 16:20:46
  (Time Queued  Total: 00:04:03  Eligible: 00:00:01)

Total Tasks: 2

Req[0]  TaskCount: 2  Partition: ALL
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [d1850]
Dedicated Resources Per Task: PROCS: 1  MEM: 250M  SWAP: 16G


IWD: [NONE]  Executable:  [NONE]
Bypass: 0  StartCount: 0
PartitionMask: [ALL]
Flags:       RESTARTABLE

job is deferred.  Reason:  NoResources  (cannot create reservation for job 
'404' (intital reservation attempt)
)
Holds:    Defer  (hold reason:  NoResources)
PE:  15.80  StartPriority:  1
cannot select job 404 for partition DEFAULT (job hold active)

===========================================================================================
the qmgr -c "p s"
# Create and define queue fast
#
create queue fast
set queue fast queue_type = Execution
set queue fast Priority = 40
set queue fast max_running = 64
set queue fast acl_host_enable = False
set queue fast acl_hosts = node19
set queue fast acl_hosts += node09
set queue fast acl_hosts += node18
set queue fast acl_hosts += node08
set queue fast resources_default.neednodes = d1950
set queue fast resources_default.nodes = 1
set queue fast resources_available.nodect = 64
set queue fast enabled = True
set queue fast started = True
#
# Create and define queue medium
#
create queue medium
set queue medium queue_type = Execution
set queue medium Priority = 40
set queue medium max_running = 10
set queue medium acl_host_enable = False
set queue medium acl_hosts = node49
set queue medium acl_hosts += node48
set queue medium acl_hosts += node41
set queue medium acl_hosts += node46
set queue medium acl_hosts += node43
set queue medium acl_hosts += node42
set queue medium resources_max.mem = 32gb
set queue medium resources_max.vmem = 32gb
set queue medium resources_default.neednodes = d1850
set queue medium resources_default.nodes = 1
set queue medium resources_available.nodect = 40
set queue medium enabled = True
set queue medium started = True

==================================================================================================
pbsnodes -a

node41
     state = free
     np = 2
     properties = d1850
     ntype = cluster
     status = opsys=linux,uname=Linux node41 2.6.9-42.ELsmp #1 SMP Wed Jul 
12 23:32:02 EDT 2006 x86_64,sessions=4329 
4379,nsessions=2,nusers=2,idletime=33784,totmem=5678720kb,availmem=2859524kb,physmem=8169096kb,ncpus=4,loadave=1.98,netload=4294967294,state=free,jobs=? 
0,rectime=1209587507

node42
     state = free
     np = 2
     properties = d1850
     ntype = cluster
     status = opsys=linux,uname=Linux node42 2.6.9-42.ELsmp #1 SMP Wed Jul 
12 23:32:02 EDT 2006 x86_64,sessions=4321 
6593,nsessions=2,nusers=2,idletime=818682,totmem=4826752kb,availmem=1952512kb,physmem=8169096kb,ncpus=4,loadave=0.87,netload=4294967294,state=free,jobs=? 
0,rectime=1209587506

node43
     state = free
     np = 2
     properties = d1850
     ntype = cluster
     status = opsys=linux,uname=Linux node43 2.6.9-42.ELsmp #1 SMP Wed Jul 
12 23:32:02 EDT 2006 
x86_64,sessions=4319,nsessions=1,nusers=1,idletime=818333,totmem=6006400kb,availmem=5874292kb,physmem=8169096kb,ncpus=4,loadave=0.00,netload=38605147,state=free,jobs=? 
0,rectime=1209587506

=====================================================================================================
checknode node46


checking node node46

State:      Idle  (in current state for 1:03:36)
Configured Resources: PROCS: 2  MEM: 7977M  SWAP: 7977M  DISK: 1M
Utilized   Resources: [NONE]
Dedicated  Resources: [NONE]
Opsys:         linux  Arch:      [NONE]
Speed:      1.00  Load:       2.000
Network:    [DEFAULT]
Features:   [d1850]
Attributes: [Batch]
Classes:    [medium 2:2][batch 2:2]

Total Time: 1:02:46:37  Up: 1:47:41 (6.70%)  Active: 00:00:00 (0.00%)

Reservations:
NOTE:  no reservations on node
ALERT:  node is in state Idle but load is high (2.000)


------
Sincerely,

   Tom Pierce
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20080430/7729fc04/attachment.html


More information about the torqueusers mailing list