[torquedev] Mod we use at Harte-Hanks
Garrick Staples
garrick at clusterresources.com
Wed Jul 26 13:14:39 MDT 2006
On Wed, Jul 26, 2006 at 03:51:28PM -0400, Caird, Andrew J alleged:
>
> Is this pretty much the same as moab's GRES functionality?
>
> http://www.clusterresources.com/products/mwm/docs/12.5generalresources.s
> html
>
> Not that I think that's bad (or good), I'm just checking.
I don't think so. GRES is per-node, while these tokens are floating
server-wide.
> --andy
>
> > -----Original Message-----
> > From: torquedev-bounces at supercluster.org
> > [mailto:torquedev-bounces at supercluster.org] On Behalf Of
> > Jonas_Berlin at harte-hanks.com
> > Sent: Wednesday, July 26, 2006 3:28 PM
> > To: torquedev at supercluster.org
> > Subject: [torquedev] Mod we use at Harte-Hanks
> >
> >
> > Hi,
> >
> > I initially sent this to Garrick and he suggested I forward
> > it to the whole list.
> > It is basically a way of implementing license management as a
> > counted resource:
> >
> > At Harte-Hanks we have made some mods to the server and the
> > fifo scheduler to be able to use counted resources.
> > Basically there is a new attribute vector for the server
> > called tokens. A token is a string followed by a float i. e.
> > "foobar:2".
> > So, the server owns set set of counted resources (see last lines):
> >
> > Qmgr: p s
> > #
> > # Create queues and set their attributes.
> > #
> > #
> > # Create and define queue default
> > #
> > create queue default
> > set queue default queue_type = Execution set queue default
> > enabled = True set queue default started = True # # Create
> > and define queue batch # create queue batch set queue batch
> > queue_type = Execution set queue batch
> > resources_default.nodes = 1 set queue batch
> > resources_default.walltime = 01:00:00 set queue batch enabled
> > = True set queue batch started = True # # Set server attributes.
> > #
> > set server scheduling = True
> > set server acl_hosts = abinitio-rd0
> > set server managers = jberlin at abinitio-rd0 set server
> > default_queue = batch set server log_events = 511 set server
> > mail_from = adm set server query_other_jobs = True set server
> > scheduler_iteration = 600 set server node_check_rate = 150
> > set server tcp_timeout = 6 set server pbs_version = 2.1.1 set
> > server tokens = radi:2 set server tokens += foobar:2 set
> > server tokens += hoobar:3
> >
> > When a user submits a job through qsub they specify the
> > requested counted resource: "-l tokens=foobar:1"
> >
> > The fifo scheduler aggregates all token usage across all
> > queues, and if the requested resource is available the job
> > runs, otherwise it remains queued:
> >
> > Job Id: 10.abinitio-rd0
> > Job_Name = hello2.ksh
> > Job_Owner = jberlin at abinitio-rd0
> > job_state = Q
> > queue = batch
> > server = abinitio-rd0
> > Checkpoint = u
> > ctime = Wed Jul 12 15:08:12 2006
> > Error_Path = abinitio-rd0:/home/jberlin/hello2.ksh.e10
> > Hold_Types = n
> > Join_Path = n
> > Keep_Files = n
> > Mail_Points = a
> > mtime = Wed Jul 12 15:08:12 2006
> > Output_Path = abinitio-rd0:/home/jberlin/hello2.ksh.o10
> > Priority = 0
> > qtime = Wed Jul 12 15:08:12 2006
> > Rerunable = True
> > Resource_List.neednodes = 1
> > Resource_List.nodect = 1
> > Resource_List.nodes = 1
> > Resource_List.tokens = foobar:2
> > Resource_List.walltime = 01:00:00
> > substate = 10
> > Variable_List = PBS_O_HOME=/home/jberlin,PBS_O_LANG=en_US.UTF-8,
> > PBS_O_LOGNAME=jberlin,
> >
> > PBS_O_PATH=/usr/java/j2sdk1.4.2_11/bin:/usr/atria/bin/:/prod/software
> >
> > /bin:/usr/local/bin:/opt/syncsort/bin:/opt/SUNWspro/bin:/tools
> > /bin:/bi
> >
> > n:/usr/bin:/usr/ucb:/usr/ccs/bin:/etc:/usr/etc:/usr/bin/X11:/b
> > in:.:/us
> >
> > r/local/abinitio/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/us
> > r/bin:/u
> >
> > sr/X11R6/bin:/share/change/Opus:/share/change:/usr/local/abini
> > tio/bin:
> >
> > /sandbox/abinitio/sand/stdenv/tools:/u01/app/oracle/product/10
> > .1.0.3:/
> >
> > u01/app/oracle/product/10.1.0.3/bin:/u01/app/oracle/product/10
> > .1.0.3/l
> > ib,PBS_O_MAIL=/var/spool/mail/jberlin,PBS_O_SHELL=/bin/ksh,
> >
> > PBS_O_HOST=abinitio-rd0,PBS_O_WORKDIR=/home/jberlin,PBS_O_QUEUE=batch
> > euser = jberlin
> > egroup = pdgrp
> > queue_rank = 2
> > queue_type = E
> > comment = Not Running: Max token usage reached
> > etime = Wed Jul 12 15:08:12 2006
> >
> > There is also accounting of who is using what tokens (A means
> > allocation at startup to a specific job, U means current
> > usage across the server, and P means current pool owned by
> > the server):
> >
> > bash-2.05b# more /var/torque/sched_priv/accounting/20060711
> > 07/11/2006 16:14:36;A;4.abinitio-rd0;radi:1
> > 07/11/2006 16:14:36;U;;radi:1.00
> > 07/11/2006 16:14:36;P;;radi:2.00
> > 07/11/2006 16:18:01;A;6.abinitio-rd0;radi:1
> > 07/11/2006 16:18:01;U;;radi:1.00
> > 07/11/2006 16:18:01;P;;radi:2.00
> > 07/11/2006 16:25:49;A;7.abinitio-rd0;foobar:1
> > 07/11/2006 16:25:49;U;;foobar:1.00
> > 07/11/2006 16:25:49;P;;radi:14.00,foobar:2.00
> > 07/12/2006 15:08:08;A;9.abinitio-rd0;foobar:2
> > 07/12/2006 15:08:08;U;;foobar:2.00
> > 07/12/2006 15:08:08;P;;radi:2.00,foobar:2.00,hoobar:3.00
> > 07/12/2006 15:09:08;A;10.abinitio-rd0;foobar:2
> > 07/12/2006 15:09:08;U;;foobar:4.00
> > 07/12/2006 15:09:08;P;;radi:4.00,foobar:4.00,hoobar:6.00
> >
> > Here is a patch against 2.1.1.
> >
> > There are two other changes in here that you probably don't want.
> >
> > 1. The abwritelog command with the associated abwritelog in
> > Libcmds. This is used for runtime logging in our ETL
> > environment and very specific to our ETL tool.
> > 2. Since we run the ETL GUI interactively through qsub -I, I
> > messed with qsub, in order to not require qsub -I to be run
> > from a terminal.
> >
> > There are a few bad things about the token implementation:
> >
> > 1. The location of the accouting directory is hard coded.
> > Since the fifo scheduler appears to be using a slightly
> > different make system than the rest of Torque I couldn't
> > figure out how the make the same configuration mechanism that
> > is used in the server work for the fifo scheduler.
> > 2. The definition of the additional server and job attributes
> > should probably be moved from the *site*.ht files to the main
> > .h files to be consistent 3. To get the new files to compile
> > I messed with the Makefile.in files that are distributed,
> > rather than figuring out how to integrate it into autoconf.
> >
> > Let me know if you have any interest and if I can answer any
> > questions.
> >
> >
> >
> >
> > Jonas Berlin Ph. D.
> > Chief Architect
> > Product & Systems Development
> > Harte-Hanks
> > 25 Linnell Circle
> > Billerica, MA 01821
> > USA
> > Phone +1-978-436-2818
> > Mobile +1-508-361-5921
> > Fax +1-978-439-3940
> > jberlin at hartehanks.com
> >
> > <http://www.getfirefox.com/>
> >
> _______________________________________________
> torquedev mailing list
> torquedev at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torquedev
More information about the torquedev
mailing list