[torquedev] Mod we use at Harte-Hanks
brockp at umich.edu
Wed Jul 26 13:53:33 MDT 2006
Yeah it is,
Center for Advanced Computing
brockp at umich.edu
On Jul 26, 2006, at 3:51 PM, Caird, Andrew J wrote:
> Is this pretty much the same as moab's GRES functionality?
> Not that I think that's bad (or good), I'm just checking.
>> -----Original Message-----
>> From: torquedev-bounces at supercluster.org
>> [mailto:torquedev-bounces at supercluster.org] On Behalf Of
>> Jonas_Berlin at harte-hanks.com
>> Sent: Wednesday, July 26, 2006 3:28 PM
>> To: torquedev at supercluster.org
>> Subject: [torquedev] Mod we use at Harte-Hanks
>> I initially sent this to Garrick and he suggested I forward
>> it to the whole list.
>> It is basically a way of implementing license management as a
>> counted resource:
>> At Harte-Hanks we have made some mods to the server and the
>> fifo scheduler to be able to use counted resources.
>> Basically there is a new attribute vector for the server
>> called tokens. A token is a string followed by a float i. e.
>> So, the server owns set set of counted resources (see last lines):
>> Qmgr: p s
>> # Create queues and set their attributes.
>> # Create and define queue default
>> create queue default
>> set queue default queue_type = Execution set queue default
>> enabled = True set queue default started = True # # Create
>> and define queue batch # create queue batch set queue batch
>> queue_type = Execution set queue batch
>> resources_default.nodes = 1 set queue batch
>> resources_default.walltime = 01:00:00 set queue batch enabled
>> = True set queue batch started = True # # Set server attributes.
>> set server scheduling = True
>> set server acl_hosts = abinitio-rd0
>> set server managers = jberlin at abinitio-rd0 set server
>> default_queue = batch set server log_events = 511 set server
>> mail_from = adm set server query_other_jobs = True set server
>> scheduler_iteration = 600 set server node_check_rate = 150
>> set server tcp_timeout = 6 set server pbs_version = 2.1.1 set
>> server tokens = radi:2 set server tokens += foobar:2 set
>> server tokens += hoobar:3
>> When a user submits a job through qsub they specify the
>> requested counted resource: "-l tokens=foobar:1"
>> The fifo scheduler aggregates all token usage across all
>> queues, and if the requested resource is available the job
>> runs, otherwise it remains queued:
>> Job Id: 10.abinitio-rd0
>> Job_Name = hello2.ksh
>> Job_Owner = jberlin at abinitio-rd0
>> job_state = Q
>> queue = batch
>> server = abinitio-rd0
>> Checkpoint = u
>> ctime = Wed Jul 12 15:08:12 2006
>> Error_Path = abinitio-rd0:/home/jberlin/hello2.ksh.e10
>> Hold_Types = n
>> Join_Path = n
>> Keep_Files = n
>> Mail_Points = a
>> mtime = Wed Jul 12 15:08:12 2006
>> Output_Path = abinitio-rd0:/home/jberlin/hello2.ksh.o10
>> Priority = 0
>> qtime = Wed Jul 12 15:08:12 2006
>> Rerunable = True
>> Resource_List.neednodes = 1
>> Resource_List.nodect = 1
>> Resource_List.nodes = 1
>> Resource_List.tokens = foobar:2
>> Resource_List.walltime = 01:00:00
>> substate = 10
>> Variable_List = PBS_O_HOME=/home/jberlin,PBS_O_LANG=en_US.UTF-8,
>> euser = jberlin
>> egroup = pdgrp
>> queue_rank = 2
>> queue_type = E
>> comment = Not Running: Max token usage reached
>> etime = Wed Jul 12 15:08:12 2006
>> There is also accounting of who is using what tokens (A means
>> allocation at startup to a specific job, U means current
>> usage across the server, and P means current pool owned by
>> the server):
>> bash-2.05b# more /var/torque/sched_priv/accounting/20060711
>> 07/11/2006 16:14:36;A;4.abinitio-rd0;radi:1
>> 07/11/2006 16:14:36;U;;radi:1.00
>> 07/11/2006 16:14:36;P;;radi:2.00
>> 07/11/2006 16:18:01;A;6.abinitio-rd0;radi:1
>> 07/11/2006 16:18:01;U;;radi:1.00
>> 07/11/2006 16:18:01;P;;radi:2.00
>> 07/11/2006 16:25:49;A;7.abinitio-rd0;foobar:1
>> 07/11/2006 16:25:49;U;;foobar:1.00
>> 07/11/2006 16:25:49;P;;radi:14.00,foobar:2.00
>> 07/12/2006 15:08:08;A;9.abinitio-rd0;foobar:2
>> 07/12/2006 15:08:08;U;;foobar:2.00
>> 07/12/2006 15:08:08;P;;radi:2.00,foobar:2.00,hoobar:3.00
>> 07/12/2006 15:09:08;A;10.abinitio-rd0;foobar:2
>> 07/12/2006 15:09:08;U;;foobar:4.00
>> 07/12/2006 15:09:08;P;;radi:4.00,foobar:4.00,hoobar:6.00
>> Here is a patch against 2.1.1.
>> There are two other changes in here that you probably don't want.
>> 1. The abwritelog command with the associated abwritelog in
>> Libcmds. This is used for runtime logging in our ETL
>> environment and very specific to our ETL tool.
>> 2. Since we run the ETL GUI interactively through qsub -I, I
>> messed with qsub, in order to not require qsub -I to be run
>> from a terminal.
>> There are a few bad things about the token implementation:
>> 1. The location of the accouting directory is hard coded.
>> Since the fifo scheduler appears to be using a slightly
>> different make system than the rest of Torque I couldn't
>> figure out how the make the same configuration mechanism that
>> is used in the server work for the fifo scheduler.
>> 2. The definition of the additional server and job attributes
>> should probably be moved from the *site*.ht files to the main
>> .h files to be consistent 3. To get the new files to compile
>> I messed with the Makefile.in files that are distributed,
>> rather than figuring out how to integrate it into autoconf.
>> Let me know if you have any interest and if I can answer any
>> Jonas Berlin Ph. D.
>> Chief Architect
>> Product & Systems Development
>> 25 Linnell Circle
>> Billerica, MA 01821
>> Phone +1-978-436-2818
>> Mobile +1-508-361-5921
>> Fax +1-978-439-3940
>> jberlin at hartehanks.com
> torquedev mailing list
> torquedev at supercluster.org
More information about the torquedev