[torquedev] Mod we use at Harte-Hanks

Brock Palen brockp at umich.edu
Wed Jul 26 13:53:33 MDT 2006


Yeah it is,

Brock Palen
Center for Advanced Computing
brockp at umich.edu
(734)936-1985


On Jul 26, 2006, at 3:51 PM, Caird, Andrew J wrote:

>
> Is this pretty much the same as moab's GRES functionality?
>
> http://www.clusterresources.com/products/mwm/docs/ 
> 12.5generalresources.s
> html
>
> Not that I think that's bad (or good), I'm just checking.
>
> --andy
>
>> -----Original Message-----
>> From: torquedev-bounces at supercluster.org
>> [mailto:torquedev-bounces at supercluster.org] On Behalf Of
>> Jonas_Berlin at harte-hanks.com
>> Sent: Wednesday, July 26, 2006 3:28 PM
>> To: torquedev at supercluster.org
>> Subject: [torquedev] Mod we use at Harte-Hanks
>>
>>
>> Hi,
>>
>> I initially sent this to Garrick and he suggested I forward
>> it to the whole list.
>> It is basically a way of implementing license management as a
>> counted resource:
>>
>> At Harte-Hanks we have made some mods to the server and the
>> fifo scheduler to be able to use counted resources.
>> Basically there is a new attribute vector for the server
>> called tokens. A token is a string followed by a float i. e.
>> "foobar:2".
>> So, the server owns set set of counted resources (see last lines):
>>
>> Qmgr: p s
>> #
>> # Create queues and set their attributes.
>> #
>> #
>> # Create and define queue default
>> #
>> create queue default
>> set queue default queue_type = Execution set queue default
>> enabled = True set queue default started = True # # Create
>> and define queue batch # create queue batch set queue batch
>> queue_type = Execution set queue batch
>> resources_default.nodes = 1 set queue batch
>> resources_default.walltime = 01:00:00 set queue batch enabled
>> = True set queue batch started = True # # Set server attributes.
>> #
>> set server scheduling = True
>> set server acl_hosts = abinitio-rd0
>> set server managers = jberlin at abinitio-rd0 set server
>> default_queue = batch set server log_events = 511 set server
>> mail_from = adm set server query_other_jobs = True set server
>> scheduler_iteration = 600 set server node_check_rate = 150
>> set server tcp_timeout = 6 set server pbs_version = 2.1.1 set
>> server tokens = radi:2 set server tokens += foobar:2 set
>> server tokens += hoobar:3
>>
>> When a user submits a job through qsub they specify the
>> requested counted resource: "-l tokens=foobar:1"
>>
>> The fifo scheduler aggregates all token usage across all
>> queues, and if the requested resource is available the job
>> runs, otherwise it remains queued:
>>
>> Job Id: 10.abinitio-rd0
>>     Job_Name = hello2.ksh
>>     Job_Owner = jberlin at abinitio-rd0
>>     job_state = Q
>>     queue = batch
>>     server = abinitio-rd0
>>     Checkpoint = u
>>     ctime = Wed Jul 12 15:08:12 2006
>>     Error_Path = abinitio-rd0:/home/jberlin/hello2.ksh.e10
>>     Hold_Types = n
>>     Join_Path = n
>>     Keep_Files = n
>>     Mail_Points = a
>>     mtime = Wed Jul 12 15:08:12 2006
>>     Output_Path = abinitio-rd0:/home/jberlin/hello2.ksh.o10
>>     Priority = 0
>>     qtime = Wed Jul 12 15:08:12 2006
>>     Rerunable = True
>>     Resource_List.neednodes = 1
>>     Resource_List.nodect = 1
>>     Resource_List.nodes = 1
>>     Resource_List.tokens = foobar:2
>>     Resource_List.walltime = 01:00:00
>>     substate = 10
>>     Variable_List = PBS_O_HOME=/home/jberlin,PBS_O_LANG=en_US.UTF-8,
>>         PBS_O_LOGNAME=jberlin,
>>
>> PBS_O_PATH=/usr/java/j2sdk1.4.2_11/bin:/usr/atria/bin/:/prod/software
>>
>> /bin:/usr/local/bin:/opt/syncsort/bin:/opt/SUNWspro/bin:/tools
>> /bin:/bi
>>
>> n:/usr/bin:/usr/ucb:/usr/ccs/bin:/etc:/usr/etc:/usr/bin/X11:/b
>> in:.:/us
>>
>> r/local/abinitio/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/us
>> r/bin:/u
>>
>> sr/X11R6/bin:/share/change/Opus:/share/change:/usr/local/abini
>> tio/bin:
>>
>> /sandbox/abinitio/sand/stdenv/tools:/u01/app/oracle/product/10
>> .1.0.3:/
>>
>> u01/app/oracle/product/10.1.0.3/bin:/u01/app/oracle/product/10
>> .1.0.3/l
>>         ib,PBS_O_MAIL=/var/spool/mail/jberlin,PBS_O_SHELL=/bin/ksh,
>>
>> PBS_O_HOST=abinitio-rd0,PBS_O_WORKDIR=/home/jberlin,PBS_O_QUEUE=batch
>>     euser = jberlin
>>     egroup = pdgrp
>>     queue_rank = 2
>>     queue_type = E
>>     comment = Not Running: Max token usage reached
>>     etime = Wed Jul 12 15:08:12 2006
>>
>> There is also accounting of who is using what tokens (A means
>> allocation at startup to a specific job, U means current
>> usage across the server, and P means current pool owned by
>> the server):
>>
>> bash-2.05b# more /var/torque/sched_priv/accounting/20060711
>> 07/11/2006 16:14:36;A;4.abinitio-rd0;radi:1
>> 07/11/2006 16:14:36;U;;radi:1.00
>> 07/11/2006 16:14:36;P;;radi:2.00
>> 07/11/2006 16:18:01;A;6.abinitio-rd0;radi:1
>> 07/11/2006 16:18:01;U;;radi:1.00
>> 07/11/2006 16:18:01;P;;radi:2.00
>> 07/11/2006 16:25:49;A;7.abinitio-rd0;foobar:1
>> 07/11/2006 16:25:49;U;;foobar:1.00
>> 07/11/2006 16:25:49;P;;radi:14.00,foobar:2.00
>> 07/12/2006 15:08:08;A;9.abinitio-rd0;foobar:2
>> 07/12/2006 15:08:08;U;;foobar:2.00
>> 07/12/2006 15:08:08;P;;radi:2.00,foobar:2.00,hoobar:3.00
>> 07/12/2006 15:09:08;A;10.abinitio-rd0;foobar:2
>> 07/12/2006 15:09:08;U;;foobar:4.00
>> 07/12/2006 15:09:08;P;;radi:4.00,foobar:4.00,hoobar:6.00
>>
>> Here is a patch against 2.1.1.
>>
>> There are two other changes in here that you probably don't want.
>>
>> 1. The abwritelog command with the associated abwritelog in
>> Libcmds. This is used for runtime logging in our ETL
>> environment and very specific to our ETL tool.
>> 2. Since we run the ETL GUI interactively through qsub -I, I
>> messed with qsub, in order to not require qsub -I to be run
>> from a terminal.
>>
>> There are a few bad things about the token implementation:
>>
>> 1. The location of the accouting directory is hard coded.
>> Since the fifo scheduler appears to be using a slightly
>> different make system than the rest of Torque I couldn't
>> figure out how the make the same configuration mechanism that
>> is used in the server work for the fifo scheduler.
>> 2. The definition of the additional server and job attributes
>> should probably be moved from the *site*.ht files to the main
>> .h files to be consistent 3. To get the new files to compile
>> I messed with the Makefile.in files that are distributed,
>> rather than figuring out how to integrate it into autoconf.
>>
>> Let me know if you have any interest and if I can answer any
>> questions.
>>
>>
>>
>>
>> Jonas Berlin Ph. D.
>> Chief Architect
>> Product & Systems Development
>> Harte-Hanks
>> 25 Linnell Circle
>> Billerica, MA 01821
>> USA
>> Phone +1-978-436-2818
>> Mobile +1-508-361-5921
>> Fax +1-978-439-3940
>> jberlin at hartehanks.com
>>
>>  <http://www.getfirefox.com/>
>>
> _______________________________________________
> torquedev mailing list
> torquedev at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torquedev
>
>



More information about the torquedev mailing list