[torqueusers] request: indicate partially used nodes
Michael Sternberg
sternberg at anl.gov
Tue Feb 3 06:57:06 MST 2009
Hello developers,
I noticed that "pbsnodes -l" does not flag partially used nodes as
such in the "state" field. More specific state indicators would ease
administration tasks where empty nodes are to be identified (e.g. for
upgrades or some other maintenance).
Consider the example below -- both nodes are 8-core (and declared as
np=8 in server_priv/nodes), and run a 4-core-only job, yet are simply
reported as state "free" or "offline" (draining). The reason partial
load arises in my cluster is that we have an older commercial app that
supports only up to 4 threads.
Does the PBS standard (if there is one) or common practice allow more
fine-grained state indicators than "free" and "job-exclusive"?
Here is an awk workaround, groping around in the full pbsnodes output:
Busy:
pbsnodes -a | awk '/^[^ ]/ {node=$1} /jobs =/ {print node}'
Free:
pbsnodes -a | awk '/^[^ ]/ {node=$1} /jobs =/ {node=""} /status =/ &&
node {print node}'
Is there a better way to identify idle nodes in torque? [For the two
sample nodes, Moab's "mdiag -n" reports "Busy" with a Procs field of
"4:8", which could be used, but "Drained" with Procs "0:8", which is
incorrect.]
My server and most clients are running torque-2.3.6.
With best regards,
Michael
Example:
--------------------------------------------------------------
# pbsnodes n135 n138
n135
state = free
np = 8
ntype = cluster
jobs = 0/10528.mds01, 1/10528.mds01, 2/10528.mds01, 3/10528.mds01
status = opsys=linux,uname=Linux n135 2.6.18-92.1.10.el5_lustre.
1.6.6smp #1 SMP Tue Aug 26 12:16:17 EDT 2008 x86_64,sessions=24746
24815
,nsessions
=
2
,nusers
=
1
,idletime
=
400874
,totmem
=
33219548kb
,availmem
=
29639392kb
,physmem
=
16439664kb
,ncpus
=
8
,loadave
=
4.08
,netload
=18800817846,state=free,jobs=10528.mds01,varattr=,rectime=1233666309
n138
state = offline
np = 8
ntype = cluster
jobs = 0/10438.mds01, 1/10438.mds01, 2/10438.mds01, 3/10438.mds01
status = opsys=linux,uname=Linux n138 2.6.18-92.1.10.el5_lustre.
1.6.6smp #1 SMP Tue Aug 26 12:16:17 EDT 2008
x86_64
,sessions
=
20026
,nsessions
=
1
,nusers
=
1
,idletime
=
2519409
,totmem
=
33219548kb
,availmem
=
32673008kb
,physmem
=
16439664kb
,ncpus
=
8
,loadave
=
4.07
,netload
=18994455210,state=free,jobs=10438.mds01,varattr=,rectime=1233666336
--------------------------------------------------------------
More information about the torqueusers
mailing list