[torqueusers] Newly online nodes / queued jobs
Chris Evert
chris at geodev.com
Mon Jan 8 14:11:05 MST 2007
Thanks for the help so far.
It seems the newly online part was a red herring. The symptom really is
that jobs with more than one cpu do not start automatically on this one
node, jobs which request 1 cpu jump right on.
I defined this node in torque after and separately from defining the
other nodes, but I cannot see the difference (except in behavior).
The clues:
checkjob on a job ready for running when the node was idle:
---
checking job 14687
State: Idle
Creds: user:harger group:users class:marine qos:DEFAULT
WallTime: 00:00:00 of 10:00:00:00
SubmitTime: Sat Jan 6 22:37:34
(Time Queued Total: 1:07:45:04 Eligible: 1:07:45:04)
Total Tasks: 4
Req[0] TaskCount: 4 Partition: ALL
Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
Opsys: [NONE] Arch: [NONE] Features: [gp][cluster]
IWD: [NONE] Executable: [NONE]
Bypass: 1 StartCount: 0
PartitionMask: [ALL]
PE: 4.00 StartPriority: 1905
job cannot run in partition DEFAULT (idle procs do not meet requirements
: 0 of 4 procs found)
idle procs: 8 feasible procs: 0
Rejection Reasons: [Features : 3][CPU : 1][State
: 99]
---
pbsnodes of bad node:
---
fig24
state = free
np = 8
properties = gp,cluster,Linux26,fig,notfir,notyew,notgpa,notgpb
ntype = cluster
status = opsys=linux,uname=Linux fig24 2.6.16.21-0.25-smp #1 SMP
Tue Sep 19
07:26:15 UTC 2006 x86_64,sessions=? 0,nsessions=?
0,nusers=0,idletime=227854,totmem=18654344kb,availmem=18086724kb,physmem=8163940kb,ncpus=8,loadave=0.00,netload=501502391905,state=free,jobs=?
0,rectime=1168258987
---
and pbsnodes of similar and yet good nodes:
---
gpa01
state = job-exclusive
np = 8
properties = gp,cluster,Linux26,gpa,notfig,notfir,notyew,notgpb
ntype = cluster
jobs = 0/14647.saturn, 1/14647.saturn, 2/14647.saturn,
3/14647.saturn, 4/14
648.saturn, 5/14648.saturn, 6/14648.saturn, 7/14648.saturn
status = opsys=linux,uname=Linux gpa01 2.6.5-7.97-smp #1 SMP Fri
Jul 2 14:2
1:59 UTC 2004 x86_64,sessions=11375
11509,nsessions=2,nusers=1,idletime=20465703,totmem=32992356kb,availmem=30851432kb,physmem=16212472kb,ncpus=8,loadave=8.00,netload=34436467411,state=free,jobs=14647.saturn14648.saturn,rectime=1168258988
fig23
state = job-exclusive
np = 4
properties = gp,cluster,Linux26,fig,notfir,notyew,notgpa,notgpb
ntype = cluster
jobs = 0/14675.saturn, 1/14675.saturn, 2/14675.saturn, /14675.saturn
status = opsys=linux,uname=Linux fig23 2.6.16.21-0.25-smp #1 SMP
Tue Sep 19
07:26:15 UTC 2006
x86_64,sessions=29110,nsessions=1,nusers=1,idletime=1611587,totmem=18654476kb,availmem=17073836kb,physmem=8164072kb,ncpus=4,loadave=4.04,netload=224098073924,state=free,jobs=14675.saturn,rectime=1168259002
---
I don't understand the question marks in the status line of fig24 (the
bad node). Perhaps that points to what I configured wrong...
Any help greatly appreciated,
Chris
--
Chris Evert
chris at geodev.com
Geophysical Development Corporation
Houston, TX
Chris Samuel wrote:
> On Friday 05 January 2007 08:56, Chris Evert wrote:
>
> Hi Chris
>
>> So my question, perhaps, should be "How might I tickle maui to recompute
>> available resources?"
>
> This should do it : schedctl -r
>
>> Another question is "Is this the correct forum for maui questions or
>> should I find a mauiusers out there?"
>
> http://www.supercluster.org/mailman/listinfo/mauiusers
>
> But you're likely to get the same people answering you there too. :-)
>
> cheers,
> Chris
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
More information about the torqueusers
mailing list