[torqueusers] request: indicate partially used nodes

Michael Sternberg sternberg at anl.gov
Tue Feb 3 06:57:06 MST 2009


Hello developers,

I noticed that "pbsnodes -l" does not flag partially used nodes as  
such in the "state" field.  More specific state indicators would ease  
administration tasks where empty nodes are to be identified (e.g. for  
upgrades or some other maintenance).

Consider the example below -- both nodes are 8-core (and declared as  
np=8 in server_priv/nodes), and run a 4-core-only job, yet are simply  
reported as state "free" or "offline" (draining).  The reason partial  
load arises in my cluster is that we have an older commercial app that  
supports only up to 4 threads.


Does the PBS standard (if there is one) or common practice allow more  
fine-grained state indicators than "free" and "job-exclusive"?

Here is an awk workaround, groping around in the full pbsnodes output:

     Busy:
	pbsnodes -a | awk '/^[^ ]/ {node=$1} /jobs =/ {print node}'

     Free:
	pbsnodes -a | awk '/^[^ ]/ {node=$1} /jobs =/ {node=""} /status =/ &&  
node {print node}'


Is there a better way to identify idle nodes in torque?  [For the two  
sample nodes, Moab's "mdiag -n" reports "Busy" with a Procs field of  
"4:8", which could be used, but "Drained" with Procs "0:8", which is  
incorrect.]


My server and most clients are running torque-2.3.6.


With best regards,
Michael


Example:

--------------------------------------------------------------
# pbsnodes n135 n138
n135
      state = free
      np = 8
      ntype = cluster
      jobs = 0/10528.mds01, 1/10528.mds01, 2/10528.mds01, 3/10528.mds01
      status = opsys=linux,uname=Linux n135 2.6.18-92.1.10.el5_lustre. 
1.6.6smp #1 SMP Tue Aug 26 12:16:17 EDT 2008 x86_64,sessions=24746  
24815 
,nsessions 
= 
2 
,nusers 
= 
1 
,idletime 
= 
400874 
,totmem 
= 
33219548kb 
,availmem 
= 
29639392kb 
,physmem 
= 
16439664kb 
,ncpus 
= 
8 
,loadave 
= 
4.08 
,netload 
=18800817846,state=free,jobs=10528.mds01,varattr=,rectime=1233666309

n138
      state = offline
      np = 8
      ntype = cluster
      jobs = 0/10438.mds01, 1/10438.mds01, 2/10438.mds01, 3/10438.mds01
      status = opsys=linux,uname=Linux n138 2.6.18-92.1.10.el5_lustre. 
1.6.6smp #1 SMP Tue Aug 26 12:16:17 EDT 2008  
x86_64 
,sessions 
= 
20026 
,nsessions 
= 
1 
,nusers 
= 
1 
,idletime 
= 
2519409 
,totmem 
= 
33219548kb 
,availmem 
= 
32673008kb 
,physmem 
= 
16439664kb 
,ncpus 
= 
8 
,loadave 
= 
4.07 
,netload 
=18994455210,state=free,jobs=10438.mds01,varattr=,rectime=1233666336
--------------------------------------------------------------




More information about the torqueusers mailing list