[torqueusers] Newly online nodes / queued jobs

Chris Evert chris at geodev.com
Mon Jan 8 14:11:05 MST 2007


Thanks for the help so far.

It seems the newly online part was a red herring.  The symptom really is 
that jobs with more than one cpu do not start automatically on this one 
node, jobs which request 1 cpu jump right on.

I defined this node in torque after and separately from defining the 
other nodes, but I cannot see the difference (except in behavior).

The clues:

checkjob on a job ready for running when the node was idle:
---
checking job 14687

State: Idle
Creds:  user:harger  group:users  class:marine  qos:DEFAULT
WallTime: 00:00:00 of 10:00:00:00
SubmitTime: Sat Jan  6 22:37:34
   (Time Queued  Total: 1:07:45:04  Eligible: 1:07:45:04)

Total Tasks: 4

Req[0]  TaskCount: 4  Partition: ALL
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [gp][cluster]


IWD: [NONE]  Executable:  [NONE]
Bypass: 1  StartCount: 0
PartitionMask: [ALL]
PE:  4.00  StartPriority:  1905
job cannot run in partition DEFAULT (idle procs do not meet requirements 
: 0 of 4 procs found)
idle procs:   8  feasible procs:   0

Rejection Reasons: [Features     :    3][CPU          :    1][State 
    :   99]
---

pbsnodes of bad node:
---
fig24
      state = free
      np = 8
      properties = gp,cluster,Linux26,fig,notfir,notyew,notgpa,notgpb
      ntype = cluster
      status = opsys=linux,uname=Linux fig24 2.6.16.21-0.25-smp #1 SMP 
Tue Sep 19
  07:26:15 UTC 2006 x86_64,sessions=? 0,nsessions=? 
0,nusers=0,idletime=227854,totmem=18654344kb,availmem=18086724kb,physmem=8163940kb,ncpus=8,loadave=0.00,netload=501502391905,state=free,jobs=? 
0,rectime=1168258987
---

and pbsnodes of similar and yet good nodes:
---
gpa01
      state = job-exclusive
      np = 8
      properties = gp,cluster,Linux26,gpa,notfig,notfir,notyew,notgpb
      ntype = cluster
      jobs = 0/14647.saturn, 1/14647.saturn, 2/14647.saturn, 
3/14647.saturn, 4/14
648.saturn, 5/14648.saturn, 6/14648.saturn, 7/14648.saturn
      status = opsys=linux,uname=Linux gpa01 2.6.5-7.97-smp #1 SMP Fri 
Jul 2 14:2
1:59 UTC 2004 x86_64,sessions=11375 
11509,nsessions=2,nusers=1,idletime=20465703,totmem=32992356kb,availmem=30851432kb,physmem=16212472kb,ncpus=8,loadave=8.00,netload=34436467411,state=free,jobs=14647.saturn14648.saturn,rectime=1168258988

fig23
      state = job-exclusive
      np = 4
      properties = gp,cluster,Linux26,fig,notfir,notyew,notgpa,notgpb
      ntype = cluster
      jobs = 0/14675.saturn, 1/14675.saturn, 2/14675.saturn,  /14675.saturn
      status = opsys=linux,uname=Linux fig23 2.6.16.21-0.25-smp #1 SMP 
Tue Sep 19
  07:26:15 UTC 2006 
x86_64,sessions=29110,nsessions=1,nusers=1,idletime=1611587,totmem=18654476kb,availmem=17073836kb,physmem=8164072kb,ncpus=4,loadave=4.04,netload=224098073924,state=free,jobs=14675.saturn,rectime=1168259002
---

I don't understand the question marks in the status line of fig24 (the 
bad node).  Perhaps that points to what I configured wrong...

Any help greatly appreciated,
Chris
--
Chris Evert
chris at geodev.com
Geophysical Development Corporation
Houston, TX

Chris Samuel wrote:
> On Friday 05 January 2007 08:56, Chris Evert wrote:
> 
> Hi Chris
> 
>> So my question, perhaps, should be "How might I tickle maui to recompute
>> available resources?"
> 
> This should do it : schedctl -r
> 
>> Another question is "Is this the correct forum for maui questions or
>> should I find a mauiusers out there?"
> 
> http://www.supercluster.org/mailman/listinfo/mauiusers
> 
> But you're likely to get the same people answering you there too. :-)
> 
> cheers,
> Chris
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers


More information about the torqueusers mailing list