[Mauiusers] trouble with classlist

Jon Wright jon at gate.sinica.edu.tw
Mon Mar 16 21:34:06 MDT 2009



Hi,

We have multiple sets of nodes and queues and as far as possible try to 
push jobs from one queue to a certain set of nodes first and if those 
are all busy to another set.

queue parallel -> d,f,k nodes (10 of each, total 30)
queue medium64 -> a and l nodes + temp1 (total of 25)

In the past we have used the SRCFG for this as below:

# Tie E4400, E2160 and E6600 machines to the medium64 queue
SRCFG[medium64]    
HOSTLIST=a01,a02,a03,a04,a05,a06,a07,a08,a09,a10,a11,a12,a13,a14,a15,a16,a17,a18,a19,a20,l01,l02,l03,l04,temp1
SRCFG[medium64]    CLASSLIST=medium64
SRCFG[medium64]    PERIOD=INFINITY
SRCFG[medium64]    RESOURCES=PROCS:-1

# Tie the parallel queue to the quad core phenoms cpus
SRCFG[parallel]    
HOSTLIST=f01,f02,f03,f04,f05,f06,f07,f08,f09,f10,d01,d02,d03,d04,d05,d06,d07,d08,d09,d10,k01,k02,k03,k04,k05,k06,k07,
k08,k09,k10
SRCFG[parallel]    CLASSLIST=parallel,medium64-
SRCFG[parallel]    PERIOD=INFINITY
SRCFG[parallel]    RESOURCES=PROCS:-1


However what is now happening is that (maui-3.2.21) any jobs submitted 
to the medium64 queue are always sent to the f,d or k nodes first and 
not to the a machines.

in fact when considering the nodes maui does not even consider the a 
machines to be available:
03/17 11:01:40 INFO:     processing node request line '1:ppn=1'
03/17 11:01:40 INFO:     job '343129' loaded:   1      jon    staff 
1209600       Idle   0 1237258899   [NONE] [NONE] [NONE] >=      0 >
=      0 [NONE] 1237258900
03/17 11:01:40 INFO:     15 PBS jobs detected on RM vanguard
03/17 11:01:40 INFO:     jobs detected: 15
03/17 11:01:40 INFO:     total jobs selected (ALL): 1/15 [State: 14]
03/17 11:01:40 INFO:     total jobs selected (ALL): 1/15 [State: 14]
03/17 11:01:40 INFO:     total jobs selected in partition ALL: 1/1
03/17 11:01:40 MQueueScheduleRJobs(Q)
03/17 11:01:40 INFO:     total jobs selected in partition ALL: 1/1
03/17 11:01:40 INFO:     total jobs selected in partition DEFAULT: 1/1
03/17 11:01:40 MQueueScheduleIJobs(Q,DEFAULT)
03/17 11:01:40 INFO:     370 feasible tasks found for job 343129:0 in 
partition DEFAULT (1 Needed)
03/17 11:01:40 INFO:     tasks located for job 343129:  1 of 1 required 
(120 feasible)
03/17 11:01:40 MJobStart(343129)
03/17 11:01:40 MRMJobStart(343129,Msg,SC)
03/17 11:01:40 MPBSJobStart(343129,vanguard,Msg,SC)
03/17 11:01:40 MPBSJobModify(343129,Resource_List,Resource,k10)
03/17 11:01:40 MPBSJobModify(343129,Resource_List,Resource,1:ppn=1)
03/17 11:01:40 INFO:     job '343129' successfully started
03/17 11:01:40 INFO:     starting job '343129'
03/17 11:01:40 INFO:     1 jobs started on iteration 2
Active Jobs------

The 120 feasible indicated that the a machines are not being considered 
because 120 is the number of cpu's available from the 30 d,k,f machines.

Now this used to work in the past, the NODEALLOCATIONPOLICY is set to 
MINRESOURCE, BACKFILL to BESTFIT.
We have another couple of queue also linked in a similar manner and they 
seem to be working fine but in this case it just donesn't work as I 
expect it too - obviously I have something wrong but any help would be 
appreciated.

Jon


More information about the mauiusers mailing list