[torquedev] Mod we use at Harte-Hanks

Garrick Staples garrick at clusterresources.com
Wed Jul 26 13:14:39 MDT 2006


On Wed, Jul 26, 2006 at 03:51:28PM -0400, Caird, Andrew J alleged:
> 
> Is this pretty much the same as moab's GRES functionality?
> 
> http://www.clusterresources.com/products/mwm/docs/12.5generalresources.s
> html
> 
> Not that I think that's bad (or good), I'm just checking.

I don't think so.  GRES is per-node, while these tokens are floating
server-wide.

 
> --andy
> 
> > -----Original Message-----
> > From: torquedev-bounces at supercluster.org 
> > [mailto:torquedev-bounces at supercluster.org] On Behalf Of 
> > Jonas_Berlin at harte-hanks.com
> > Sent: Wednesday, July 26, 2006 3:28 PM
> > To: torquedev at supercluster.org
> > Subject: [torquedev] Mod we use at Harte-Hanks
> > 
> > 
> > Hi, 
> > 
> > I initially sent this to Garrick and he suggested I forward 
> > it to the whole list. 
> > It is basically a way of implementing license management as a 
> > counted resource: 
> > 
> > At Harte-Hanks we have made some mods to the server and the 
> > fifo scheduler to be able to use counted resources. 
> > Basically there is a new attribute vector for the server 
> > called tokens. A token is a string followed by a float i. e. 
> > "foobar:2". 
> > So, the server owns set set of counted resources (see last lines): 
> > 
> > Qmgr: p s
> > #
> > # Create queues and set their attributes. 
> > #
> > #
> > # Create and define queue default
> > #
> > create queue default
> > set queue default queue_type = Execution set queue default 
> > enabled = True set queue default started = True # # Create 
> > and define queue batch # create queue batch set queue batch 
> > queue_type = Execution set queue batch 
> > resources_default.nodes = 1 set queue batch 
> > resources_default.walltime = 01:00:00 set queue batch enabled 
> > = True set queue batch started = True # # Set server attributes. 
> > #
> > set server scheduling = True
> > set server acl_hosts = abinitio-rd0
> > set server managers = jberlin at abinitio-rd0 set server 
> > default_queue = batch set server log_events = 511 set server 
> > mail_from = adm set server query_other_jobs = True set server 
> > scheduler_iteration = 600 set server node_check_rate = 150 
> > set server tcp_timeout = 6 set server pbs_version = 2.1.1 set 
> > server tokens = radi:2 set server tokens += foobar:2 set 
> > server tokens += hoobar:3 
> > 
> > When a user submits a job through qsub they specify the 
> > requested counted resource: "-l tokens=foobar:1" 
> > 
> > The fifo scheduler aggregates all token usage across all 
> > queues, and if the requested resource is available the job 
> > runs, otherwise it remains queued: 
> > 
> > Job Id: 10.abinitio-rd0 
> >     Job_Name = hello2.ksh 
> >     Job_Owner = jberlin at abinitio-rd0 
> >     job_state = Q 
> >     queue = batch 
> >     server = abinitio-rd0 
> >     Checkpoint = u 
> >     ctime = Wed Jul 12 15:08:12 2006 
> >     Error_Path = abinitio-rd0:/home/jberlin/hello2.ksh.e10 
> >     Hold_Types = n 
> >     Join_Path = n 
> >     Keep_Files = n 
> >     Mail_Points = a 
> >     mtime = Wed Jul 12 15:08:12 2006 
> >     Output_Path = abinitio-rd0:/home/jberlin/hello2.ksh.o10 
> >     Priority = 0 
> >     qtime = Wed Jul 12 15:08:12 2006 
> >     Rerunable = True 
> >     Resource_List.neednodes = 1 
> >     Resource_List.nodect = 1 
> >     Resource_List.nodes = 1 
> >     Resource_List.tokens = foobar:2 
> >     Resource_List.walltime = 01:00:00 
> >     substate = 10 
> >     Variable_List = PBS_O_HOME=/home/jberlin,PBS_O_LANG=en_US.UTF-8, 
> >         PBS_O_LOGNAME=jberlin, 
> >         
> > PBS_O_PATH=/usr/java/j2sdk1.4.2_11/bin:/usr/atria/bin/:/prod/software 
> >         
> > /bin:/usr/local/bin:/opt/syncsort/bin:/opt/SUNWspro/bin:/tools
> > /bin:/bi 
> >         
> > n:/usr/bin:/usr/ucb:/usr/ccs/bin:/etc:/usr/etc:/usr/bin/X11:/b
> > in:.:/us 
> >         
> > r/local/abinitio/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/us
> > r/bin:/u 
> >         
> > sr/X11R6/bin:/share/change/Opus:/share/change:/usr/local/abini
> > tio/bin: 
> >         
> > /sandbox/abinitio/sand/stdenv/tools:/u01/app/oracle/product/10
> > .1.0.3:/ 
> >         
> > u01/app/oracle/product/10.1.0.3/bin:/u01/app/oracle/product/10
> > .1.0.3/l 
> >         ib,PBS_O_MAIL=/var/spool/mail/jberlin,PBS_O_SHELL=/bin/ksh, 
> >         
> > PBS_O_HOST=abinitio-rd0,PBS_O_WORKDIR=/home/jberlin,PBS_O_QUEUE=batch 
> >     euser = jberlin 
> >     egroup = pdgrp 
> >     queue_rank = 2 
> >     queue_type = E 
> >     comment = Not Running: Max token usage reached 
> >     etime = Wed Jul 12 15:08:12 2006 
> > 
> > There is also accounting of who is using what tokens (A means 
> > allocation at startup to a specific job, U means current 
> > usage across the server, and P means current pool owned by 
> > the server): 
> > 
> > bash-2.05b# more /var/torque/sched_priv/accounting/20060711
> > 07/11/2006 16:14:36;A;4.abinitio-rd0;radi:1
> > 07/11/2006 16:14:36;U;;radi:1.00
> > 07/11/2006 16:14:36;P;;radi:2.00
> > 07/11/2006 16:18:01;A;6.abinitio-rd0;radi:1
> > 07/11/2006 16:18:01;U;;radi:1.00
> > 07/11/2006 16:18:01;P;;radi:2.00
> > 07/11/2006 16:25:49;A;7.abinitio-rd0;foobar:1
> > 07/11/2006 16:25:49;U;;foobar:1.00
> > 07/11/2006 16:25:49;P;;radi:14.00,foobar:2.00
> > 07/12/2006 15:08:08;A;9.abinitio-rd0;foobar:2
> > 07/12/2006 15:08:08;U;;foobar:2.00
> > 07/12/2006 15:08:08;P;;radi:2.00,foobar:2.00,hoobar:3.00
> > 07/12/2006 15:09:08;A;10.abinitio-rd0;foobar:2
> > 07/12/2006 15:09:08;U;;foobar:4.00
> > 07/12/2006 15:09:08;P;;radi:4.00,foobar:4.00,hoobar:6.00 
> > 
> > Here is a patch against 2.1.1. 
> > 
> > There are two other changes in here that you probably don't want. 
> > 
> > 1. The abwritelog command with the associated abwritelog in 
> > Libcmds. This is used for runtime logging in our ETL 
> > environment and very specific to our ETL tool. 
> > 2. Since we run the ETL GUI interactively through qsub -I, I 
> > messed with qsub, in order to not require qsub -I to be run 
> > from a terminal. 
> > 
> > There are a few bad things about the token implementation: 
> > 
> > 1. The location of the accouting directory is hard coded. 
> > Since the fifo scheduler appears to be using a slightly 
> > different make system than the rest of Torque I couldn't 
> > figure out how the make the same configuration mechanism that 
> > is used in the server work for the fifo scheduler. 
> > 2. The definition of the additional server and job attributes 
> > should probably be moved from the *site*.ht files to the main 
> > .h files to be consistent 3. To get the new files to compile 
> > I messed with the Makefile.in files that are distributed, 
> > rather than figuring out how to integrate it into autoconf. 
> > 
> > Let me know if you have any interest and if I can answer any 
> > questions. 
> > 
> > 
> > 
> > 
> > Jonas Berlin Ph. D. 
> > Chief Architect
> > Product & Systems Development
> > Harte-Hanks
> > 25 Linnell Circle
> > Billerica, MA 01821
> > USA
> > Phone +1-978-436-2818
> > Mobile +1-508-361-5921
> > Fax +1-978-439-3940
> > jberlin at hartehanks.com 
> > 
> >  <http://www.getfirefox.com/> 
> > 
> _______________________________________________
> torquedev mailing list
> torquedev at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torquedev


More information about the torquedev mailing list