[Mauiusers] Standing Reservation Problem

Stewart.Samuels at sanofi-aventis.com Stewart.Samuels at sanofi-aventis.com
Wed Feb 15 06:54:21 MST 2006


Hello Mauiusers, 
Following more testing I find that Maui does not seem to like specifying common subsets of nodes between two or more standing reservations. This is a major problem if one needs, for instance, to set up access to queues using different QOS levels for nodes that are shared with other standing reservations. 
For example, changing the srcfg configuration from my previous message (listed below) to: 
SRCFG[prime] CLASSLIST=prime,ghts,test,any,all
SRCFG[prime] PERIOD=INFINITY 
SRCFG[prime] HOSTLIST=mylnxc1-n001 
SRCFG[glide] CLASSLIST=glide,ghts,test,any,all
SRCFG[glide] PERIOD=INFINITY 
SRCFG[glide] HOSTLIST=mylnxc1-n002 
#SRCFG[ghts] CLASSLIST=ghts,test,any,all 
#SRCFG[ghts] PERIOD=INFINITY 
#SRCFG[ghts] HOSTLIST=mylnxc1-n00[1-2] 
Works fine. I can submit to all queues with prime jobs going only to node mylnxc1-n001, glide jobs only going to mylnxc1-n002, and all other jobs going to either node. But this applies all QOSLIST entries in the SRCFG to apply to all CLASSLIST entries for that SRCFG. Where, what I really want is to apply specific QOSLIST entries to specific CLASSLIST entries to specific nodes, using multiple SRCFGs as necessary.
Is anyone doing this successfully? If so, I would appreciate any help you can provide.
Stewart
	-----Original Message-----
	From: mauiusers-bounces at supercluster.org [mailto:mauiusers-bounces at supercluster.org]On Behalf Of Stewart.Samuels at sanofi-aventis.com
	Sent: Friday, February 10, 2006 5:20 PM
	To: mauiusers at supercluster.org
	Subject: [Mauiusers] Standing Reservation Problem
	
	Mauiusers, 
	I seem to be having trouble understanding the behavior of Maui. We are running Maui on Torque. I have set up queues via Torque and two Standing Reservations via Maui to direct jobs to a small cluster containing 1 Master node and 2 compute nodes. All nodes have a single cpu and 1 GB of RAM.
	The intent of my test is to execute prime jobs on mylnxc1-n001 and glide jobs on mylnxc1-n002 anytime. Additionally, I would like to run ghts, test, any, and all jobs anytime on either node mylnxc1-n001 or mylnxc1-n002. However, when submitting jobs to the prime or glide queues, they get stuck in the queue and never execute. Checkjob shows they are waiting for resources but there is nothing running on the system (see below). Jobs sent to the other queues execute properly. If I comment out the 3rd standing reservation, then the prime and glide jobs execute properly but all other jobs now get stuck in the queues with the same message from checkjob. It would appear that maui won't let me map multiple queues onto the nodes. Is anyone else experiencing this behavior?
	Is this a function of the policy? I've tried a few different node policy options with the same result for all. It doesn't seem to matter if I change it or not. And, I have the same problem using Maui 3.2.6p11 on Torque 1.2.0p1 as well as on Maui 3.2.6p14 on Torque 2.0.0p4.
	I also have the maui log set to 9 but it essentially confirms the same deferred message as checkjob. I haven't included it in this set of data because of the volume, but I can provide it if required.
	Any help would be greatly appreciated. 
	Stewart Samuels 
	Infrastructure Evolution and Integration 
	Scientific and Medical Affairs 
	Sanofi-Aventis Pharmaceutical 
	1041 Route 202-206 
	Bridgewater, NJ 08807 
	Phone: (908) 231-4762 
	Fax: (908) 231-3488 
	email: Stewart.Samuels at Sanofi-Aventis.com 
	--------------------------------------------------------------------------------------------- 
	[root at mylnxc1-a log]# qmgr -c 'p s' 
	# 
	# Create queues and set their attributes. 
	# 
	# 
	# Create and define queue glide 
	# 
	create queue glide 
	set queue glide queue_type = Execution 
	set queue glide resources_max.nodect = 1 
	set queue glide enabled = True 
	set queue glide started = True 
	# 
	# Create and define queue prime 
	# 
	create queue prime 
	set queue prime queue_type = Execution 
	set queue prime resources_max.nodect = 1 
	set queue prime enabled = True 
	set queue prime started = True 
	# 
	# Create and define queue test 
	# 
	create queue test 
	set queue test queue_type = Execution 
	set queue test resources_max.nodect = 2 
	set queue test enabled = True 
	set queue test started = True 
	# 
	# Create and define queue ghts 
	# 
	create queue ghts 
	set queue ghts queue_type = Execution 
	set queue ghts resources_max.nodect = 2 
	set queue ghts enabled = True 
	set queue ghts started = True 
	# 
	# Create and define queue any 
	# 
	create queue any 
	set queue any queue_type = Execution 
	set queue any resources_max.nodect = 2 
	set queue any enabled = True 
	set queue any started = True 
	# 
	# Create and define queue all 
	# 
	create queue all 
	set queue all queue_type = Execution 
	set queue all resources_max.nodect = 2 
	set queue all enabled = True 
	set queue all started = True 
	# 
	# Set server attributes. 
	# 
	set server scheduling = True 
	set server default_queue = ghts 
	set server log_events = 511 
	set server mail_from = adm 
	set server query_other_jobs = True 
	set server resources_default.neednodes = 1 
	set server resources_default.nodect = 1 
	set server resources_default.nodes = 1 
	set server scheduler_iteration = 600 
	set server node_ping_rate = 300 
	set server node_check_rate = 600 
	set server tcp_timeout = 6 
	set server node_pack = False 
	[root at mylnxc1-a log]# 
	------------------------------------------------------------------------------------------ 
	[root at mylnxc1-a log]# My maui.cfg 
	QUEUETIMEWEIGHT 10 
	BACKFILLPOLICY FIRSTFIT 
	RESERVATIONPOLICY CURRENTHIGHEST 
	#NODEALLOCATIONPOLICY MINRESOURCE 
	JOBNODEMATCHPOLICY EXACTNODE 
	NODEACCESSPOLICY SHARED 
	CLASSCFG[glide] MAXPROC=1 
	CLASSCFG[prime] MAXPROC=1 
	CLASSCFG[test] MAXPROC=2 
	CLASSCFG[ghts] MAXPROC=2 
	CLASSCFG[all] MAXPROC=2 
	CLASSCFG[any] MAXPROC=2 
	CREDWEIGHT 1 
	CLASSWEIGHT 1 
	QOSWEIGHT 1 
	XFACTORWEIGHT 1 
	SRCFG[prime] CLASSLIST=prime 
	SRCFG[prime] PERIOD=INFINITY 
	SRCFG[prime] HOSTLIST=mylnxc1-n001 
	SRCFG[glide] CLASSLIST=glide 
	SRCFG[glide] PERIOD=INFINITY 
	SRCFG[glide] HOSTLIST=mylnxc1-n002 
	SRCFG[ghts] CLASSLIST=ghts,test,any,all 
	SRCFG[ghts] PERIOD=INFINITY 
	SRCFG[ghts] HOSTLIST=mylnxc1-n00[1-2] 
	[nm67109 at mylnxc1-a nm67109]$ checkjob 108 
	checking job 108 
	State: Idle EState: Deferred 
	Creds: user:nm67109 group:lgdgis class:prime qos:DEFAULT 
	WallTime: 00:00:00 of 99:23:59:59 
	SubmitTime: Fri Feb 10 17:04:49 
	(Time Queued Total: 00:00:45 Eligible: 00:00:01) 
	Total Tasks: 1 
	Req[0] TaskCount: 1 Partition: ALL 
	Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0 
	Opsys: [NONE] Arch: [NONE] Features: [NONE] 
	IWD: [NONE] Executable: [NONE] 
	Bypass: 0 StartCount: 0 
	PartitionMask: [ALL] 
	Flags: RESTARTABLE 
	job is deferred. Reason: NoResources (cannot create reservation for job '108' 
	(intital reservation attempt) 
	) 
	Holds: Defer (hold reason: NoResources) 
	PE: 1.00 StartPriority: 1 
	cannot select job 108 for partition DEFAULT (job hold active) 
	[nm67109 at mylnxc1-a nm67109]$ 


	Stewart Samuels 
	Infrastructure Evolution and Integration 
	Scientific and Medical Affairs 
	Sanofi-Aventis Pharmaceutical 
	1041 Route 202-206 
	Bridgewater, NJ 08807 
	Phone: (908) 231-4762 
	Fax: (908) 231-3488 
	email: Stewart.Samuels at Sanofi-Aventis.com 


               Stewart Samuels
               Infrastructure Evolution and Integration
               Scientific and Medical Affairs 
               Sanofi-Aventis Pharmaceutical              
               1041 Route 202-206			
              Bridgewater, NJ  08807

              Phone:	(908) 231-4762
              Fax:		(908) 231-3488
              email:		Stewart.Samuels at Sanofi-Aventis.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20060215/5c8db8c4/attachment-0001.html


More information about the mauiusers mailing list