[torquedev] Mod we use at Harte-Hanks
Caird, Andrew J
acaird at umich.edu
Wed Jul 26 13:51:28 MDT 2006
Is this pretty much the same as moab's GRES functionality?
Not that I think that's bad (or good), I'm just checking.
> -----Original Message-----
> From: torquedev-bounces at supercluster.org
> [mailto:torquedev-bounces at supercluster.org] On Behalf Of
> Jonas_Berlin at harte-hanks.com
> Sent: Wednesday, July 26, 2006 3:28 PM
> To: torquedev at supercluster.org
> Subject: [torquedev] Mod we use at Harte-Hanks
> I initially sent this to Garrick and he suggested I forward
> it to the whole list.
> It is basically a way of implementing license management as a
> counted resource:
> At Harte-Hanks we have made some mods to the server and the
> fifo scheduler to be able to use counted resources.
> Basically there is a new attribute vector for the server
> called tokens. A token is a string followed by a float i. e.
> So, the server owns set set of counted resources (see last lines):
> Qmgr: p s
> # Create queues and set their attributes.
> # Create and define queue default
> create queue default
> set queue default queue_type = Execution set queue default
> enabled = True set queue default started = True # # Create
> and define queue batch # create queue batch set queue batch
> queue_type = Execution set queue batch
> resources_default.nodes = 1 set queue batch
> resources_default.walltime = 01:00:00 set queue batch enabled
> = True set queue batch started = True # # Set server attributes.
> set server scheduling = True
> set server acl_hosts = abinitio-rd0
> set server managers = jberlin at abinitio-rd0 set server
> default_queue = batch set server log_events = 511 set server
> mail_from = adm set server query_other_jobs = True set server
> scheduler_iteration = 600 set server node_check_rate = 150
> set server tcp_timeout = 6 set server pbs_version = 2.1.1 set
> server tokens = radi:2 set server tokens += foobar:2 set
> server tokens += hoobar:3
> When a user submits a job through qsub they specify the
> requested counted resource: "-l tokens=foobar:1"
> The fifo scheduler aggregates all token usage across all
> queues, and if the requested resource is available the job
> runs, otherwise it remains queued:
> Job Id: 10.abinitio-rd0
> Job_Name = hello2.ksh
> Job_Owner = jberlin at abinitio-rd0
> job_state = Q
> queue = batch
> server = abinitio-rd0
> Checkpoint = u
> ctime = Wed Jul 12 15:08:12 2006
> Error_Path = abinitio-rd0:/home/jberlin/hello2.ksh.e10
> Hold_Types = n
> Join_Path = n
> Keep_Files = n
> Mail_Points = a
> mtime = Wed Jul 12 15:08:12 2006
> Output_Path = abinitio-rd0:/home/jberlin/hello2.ksh.o10
> Priority = 0
> qtime = Wed Jul 12 15:08:12 2006
> Rerunable = True
> Resource_List.neednodes = 1
> Resource_List.nodect = 1
> Resource_List.nodes = 1
> Resource_List.tokens = foobar:2
> Resource_List.walltime = 01:00:00
> substate = 10
> Variable_List = PBS_O_HOME=/home/jberlin,PBS_O_LANG=en_US.UTF-8,
> euser = jberlin
> egroup = pdgrp
> queue_rank = 2
> queue_type = E
> comment = Not Running: Max token usage reached
> etime = Wed Jul 12 15:08:12 2006
> There is also accounting of who is using what tokens (A means
> allocation at startup to a specific job, U means current
> usage across the server, and P means current pool owned by
> the server):
> bash-2.05b# more /var/torque/sched_priv/accounting/20060711
> 07/11/2006 16:14:36;A;4.abinitio-rd0;radi:1
> 07/11/2006 16:14:36;U;;radi:1.00
> 07/11/2006 16:14:36;P;;radi:2.00
> 07/11/2006 16:18:01;A;6.abinitio-rd0;radi:1
> 07/11/2006 16:18:01;U;;radi:1.00
> 07/11/2006 16:18:01;P;;radi:2.00
> 07/11/2006 16:25:49;A;7.abinitio-rd0;foobar:1
> 07/11/2006 16:25:49;U;;foobar:1.00
> 07/11/2006 16:25:49;P;;radi:14.00,foobar:2.00
> 07/12/2006 15:08:08;A;9.abinitio-rd0;foobar:2
> 07/12/2006 15:08:08;U;;foobar:2.00
> 07/12/2006 15:08:08;P;;radi:2.00,foobar:2.00,hoobar:3.00
> 07/12/2006 15:09:08;A;10.abinitio-rd0;foobar:2
> 07/12/2006 15:09:08;U;;foobar:4.00
> 07/12/2006 15:09:08;P;;radi:4.00,foobar:4.00,hoobar:6.00
> Here is a patch against 2.1.1.
> There are two other changes in here that you probably don't want.
> 1. The abwritelog command with the associated abwritelog in
> Libcmds. This is used for runtime logging in our ETL
> environment and very specific to our ETL tool.
> 2. Since we run the ETL GUI interactively through qsub -I, I
> messed with qsub, in order to not require qsub -I to be run
> from a terminal.
> There are a few bad things about the token implementation:
> 1. The location of the accouting directory is hard coded.
> Since the fifo scheduler appears to be using a slightly
> different make system than the rest of Torque I couldn't
> figure out how the make the same configuration mechanism that
> is used in the server work for the fifo scheduler.
> 2. The definition of the additional server and job attributes
> should probably be moved from the *site*.ht files to the main
> .h files to be consistent 3. To get the new files to compile
> I messed with the Makefile.in files that are distributed,
> rather than figuring out how to integrate it into autoconf.
> Let me know if you have any interest and if I can answer any
> Jonas Berlin Ph. D.
> Chief Architect
> Product & Systems Development
> 25 Linnell Circle
> Billerica, MA 01821
> Phone +1-978-436-2818
> Mobile +1-508-361-5921
> Fax +1-978-439-3940
> jberlin at hartehanks.com
More information about the torquedev