[torqueusers] One queue (fast) works - one queue doesn't (medium)
Thomas H Dr Pierce
TPierce at rohmhaas.com
Wed Apr 30 14:41:51 MDT 2008
Dear Schedulers,
I cannot tell if this is a Torque or a Maui issue.
Basically one queue runs jobs (the fast queue) and the other does not
(the medium queue). It seems like it wants a SWAP of 16 GB. A setting I
cannot overwrite.
Why would one queue work and the other not?
Thanks for your help.
Qstat:
Job id Name User Time Use S Queue
------------------- ---------------- --------------- -------- - -----
392.ralphie base_83 xxxxxx 0 Q medium
395.ralphie Ba yyyyyy 05:18:33 R fast
396.ralphie Ba yyyyyy 05:18:16 R fast
397.ralphie Ba yyyyyy 03:27:32 R fast
402.ralphie cfd zzzzzz 00:25:23 R fast
404.ralphie test_8300 wwwwww 0 Q medium
checkjob 404
checking job 404
State: Idle EState: Deferred
Creds: user:rs0thp group:users class:medium qos:DEFAULT
WallTime: 00:00:00 of 99:23:59:59
SubmitTime: Wed Apr 30 16:20:46
(Time Queued Total: 00:04:03 Eligible: 00:00:01)
Total Tasks: 2
Req[0] TaskCount: 2 Partition: ALL
Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
Opsys: [NONE] Arch: [NONE] Features: [d1850]
Dedicated Resources Per Task: PROCS: 1 MEM: 250M SWAP: 16G
IWD: [NONE] Executable: [NONE]
Bypass: 0 StartCount: 0
PartitionMask: [ALL]
Flags: RESTARTABLE
job is deferred. Reason: NoResources (cannot create reservation for job
'404' (intital reservation attempt)
)
Holds: Defer (hold reason: NoResources)
PE: 15.80 StartPriority: 1
cannot select job 404 for partition DEFAULT (job hold active)
===========================================================================================
the qmgr -c "p s"
# Create and define queue fast
#
create queue fast
set queue fast queue_type = Execution
set queue fast Priority = 40
set queue fast max_running = 64
set queue fast acl_host_enable = False
set queue fast acl_hosts = node19
set queue fast acl_hosts += node09
set queue fast acl_hosts += node18
set queue fast acl_hosts += node08
set queue fast resources_default.neednodes = d1950
set queue fast resources_default.nodes = 1
set queue fast resources_available.nodect = 64
set queue fast enabled = True
set queue fast started = True
#
# Create and define queue medium
#
create queue medium
set queue medium queue_type = Execution
set queue medium Priority = 40
set queue medium max_running = 10
set queue medium acl_host_enable = False
set queue medium acl_hosts = node49
set queue medium acl_hosts += node48
set queue medium acl_hosts += node41
set queue medium acl_hosts += node46
set queue medium acl_hosts += node43
set queue medium acl_hosts += node42
set queue medium resources_max.mem = 32gb
set queue medium resources_max.vmem = 32gb
set queue medium resources_default.neednodes = d1850
set queue medium resources_default.nodes = 1
set queue medium resources_available.nodect = 40
set queue medium enabled = True
set queue medium started = True
==================================================================================================
pbsnodes -a
node41
state = free
np = 2
properties = d1850
ntype = cluster
status = opsys=linux,uname=Linux node41 2.6.9-42.ELsmp #1 SMP Wed Jul
12 23:32:02 EDT 2006 x86_64,sessions=4329
4379,nsessions=2,nusers=2,idletime=33784,totmem=5678720kb,availmem=2859524kb,physmem=8169096kb,ncpus=4,loadave=1.98,netload=4294967294,state=free,jobs=?
0,rectime=1209587507
node42
state = free
np = 2
properties = d1850
ntype = cluster
status = opsys=linux,uname=Linux node42 2.6.9-42.ELsmp #1 SMP Wed Jul
12 23:32:02 EDT 2006 x86_64,sessions=4321
6593,nsessions=2,nusers=2,idletime=818682,totmem=4826752kb,availmem=1952512kb,physmem=8169096kb,ncpus=4,loadave=0.87,netload=4294967294,state=free,jobs=?
0,rectime=1209587506
node43
state = free
np = 2
properties = d1850
ntype = cluster
status = opsys=linux,uname=Linux node43 2.6.9-42.ELsmp #1 SMP Wed Jul
12 23:32:02 EDT 2006
x86_64,sessions=4319,nsessions=1,nusers=1,idletime=818333,totmem=6006400kb,availmem=5874292kb,physmem=8169096kb,ncpus=4,loadave=0.00,netload=38605147,state=free,jobs=?
0,rectime=1209587506
=====================================================================================================
checknode node46
checking node node46
State: Idle (in current state for 1:03:36)
Configured Resources: PROCS: 2 MEM: 7977M SWAP: 7977M DISK: 1M
Utilized Resources: [NONE]
Dedicated Resources: [NONE]
Opsys: linux Arch: [NONE]
Speed: 1.00 Load: 2.000
Network: [DEFAULT]
Features: [d1850]
Attributes: [Batch]
Classes: [medium 2:2][batch 2:2]
Total Time: 1:02:46:37 Up: 1:47:41 (6.70%) Active: 00:00:00 (0.00%)
Reservations:
NOTE: no reservations on node
ALERT: node is in state Idle but load is high (2.000)
------
Sincerely,
Tom Pierce
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20080430/7729fc04/attachment.html
More information about the torqueusers
mailing list