[torquedev] Mod we use at Harte-Hanks

Caird, Andrew J acaird at umich.edu
Wed Jul 26 13:51:28 MDT 2006


Is this pretty much the same as moab's GRES functionality?

http://www.clusterresources.com/products/mwm/docs/12.5generalresources.s
html

Not that I think that's bad (or good), I'm just checking.

--andy

> -----Original Message-----
> From: torquedev-bounces at supercluster.org 
> [mailto:torquedev-bounces at supercluster.org] On Behalf Of 
> Jonas_Berlin at harte-hanks.com
> Sent: Wednesday, July 26, 2006 3:28 PM
> To: torquedev at supercluster.org
> Subject: [torquedev] Mod we use at Harte-Hanks
> 
> 
> Hi, 
> 
> I initially sent this to Garrick and he suggested I forward 
> it to the whole list. 
> It is basically a way of implementing license management as a 
> counted resource: 
> 
> At Harte-Hanks we have made some mods to the server and the 
> fifo scheduler to be able to use counted resources. 
> Basically there is a new attribute vector for the server 
> called tokens. A token is a string followed by a float i. e. 
> "foobar:2". 
> So, the server owns set set of counted resources (see last lines): 
> 
> Qmgr: p s
> #
> # Create queues and set their attributes. 
> #
> #
> # Create and define queue default
> #
> create queue default
> set queue default queue_type = Execution set queue default 
> enabled = True set queue default started = True # # Create 
> and define queue batch # create queue batch set queue batch 
> queue_type = Execution set queue batch 
> resources_default.nodes = 1 set queue batch 
> resources_default.walltime = 01:00:00 set queue batch enabled 
> = True set queue batch started = True # # Set server attributes. 
> #
> set server scheduling = True
> set server acl_hosts = abinitio-rd0
> set server managers = jberlin at abinitio-rd0 set server 
> default_queue = batch set server log_events = 511 set server 
> mail_from = adm set server query_other_jobs = True set server 
> scheduler_iteration = 600 set server node_check_rate = 150 
> set server tcp_timeout = 6 set server pbs_version = 2.1.1 set 
> server tokens = radi:2 set server tokens += foobar:2 set 
> server tokens += hoobar:3 
> 
> When a user submits a job through qsub they specify the 
> requested counted resource: "-l tokens=foobar:1" 
> 
> The fifo scheduler aggregates all token usage across all 
> queues, and if the requested resource is available the job 
> runs, otherwise it remains queued: 
> 
> Job Id: 10.abinitio-rd0 
>     Job_Name = hello2.ksh 
>     Job_Owner = jberlin at abinitio-rd0 
>     job_state = Q 
>     queue = batch 
>     server = abinitio-rd0 
>     Checkpoint = u 
>     ctime = Wed Jul 12 15:08:12 2006 
>     Error_Path = abinitio-rd0:/home/jberlin/hello2.ksh.e10 
>     Hold_Types = n 
>     Join_Path = n 
>     Keep_Files = n 
>     Mail_Points = a 
>     mtime = Wed Jul 12 15:08:12 2006 
>     Output_Path = abinitio-rd0:/home/jberlin/hello2.ksh.o10 
>     Priority = 0 
>     qtime = Wed Jul 12 15:08:12 2006 
>     Rerunable = True 
>     Resource_List.neednodes = 1 
>     Resource_List.nodect = 1 
>     Resource_List.nodes = 1 
>     Resource_List.tokens = foobar:2 
>     Resource_List.walltime = 01:00:00 
>     substate = 10 
>     Variable_List = PBS_O_HOME=/home/jberlin,PBS_O_LANG=en_US.UTF-8, 
>         PBS_O_LOGNAME=jberlin, 
>         
> PBS_O_PATH=/usr/java/j2sdk1.4.2_11/bin:/usr/atria/bin/:/prod/software 
>         
> /bin:/usr/local/bin:/opt/syncsort/bin:/opt/SUNWspro/bin:/tools
> /bin:/bi 
>         
> n:/usr/bin:/usr/ucb:/usr/ccs/bin:/etc:/usr/etc:/usr/bin/X11:/b
> in:.:/us 
>         
> r/local/abinitio/bin:/usr/kerberos/bin:/usr/local/bin:/bin:/us
> r/bin:/u 
>         
> sr/X11R6/bin:/share/change/Opus:/share/change:/usr/local/abini
> tio/bin: 
>         
> /sandbox/abinitio/sand/stdenv/tools:/u01/app/oracle/product/10
> .1.0.3:/ 
>         
> u01/app/oracle/product/10.1.0.3/bin:/u01/app/oracle/product/10
> .1.0.3/l 
>         ib,PBS_O_MAIL=/var/spool/mail/jberlin,PBS_O_SHELL=/bin/ksh, 
>         
> PBS_O_HOST=abinitio-rd0,PBS_O_WORKDIR=/home/jberlin,PBS_O_QUEUE=batch 
>     euser = jberlin 
>     egroup = pdgrp 
>     queue_rank = 2 
>     queue_type = E 
>     comment = Not Running: Max token usage reached 
>     etime = Wed Jul 12 15:08:12 2006 
> 
> There is also accounting of who is using what tokens (A means 
> allocation at startup to a specific job, U means current 
> usage across the server, and P means current pool owned by 
> the server): 
> 
> bash-2.05b# more /var/torque/sched_priv/accounting/20060711
> 07/11/2006 16:14:36;A;4.abinitio-rd0;radi:1
> 07/11/2006 16:14:36;U;;radi:1.00
> 07/11/2006 16:14:36;P;;radi:2.00
> 07/11/2006 16:18:01;A;6.abinitio-rd0;radi:1
> 07/11/2006 16:18:01;U;;radi:1.00
> 07/11/2006 16:18:01;P;;radi:2.00
> 07/11/2006 16:25:49;A;7.abinitio-rd0;foobar:1
> 07/11/2006 16:25:49;U;;foobar:1.00
> 07/11/2006 16:25:49;P;;radi:14.00,foobar:2.00
> 07/12/2006 15:08:08;A;9.abinitio-rd0;foobar:2
> 07/12/2006 15:08:08;U;;foobar:2.00
> 07/12/2006 15:08:08;P;;radi:2.00,foobar:2.00,hoobar:3.00
> 07/12/2006 15:09:08;A;10.abinitio-rd0;foobar:2
> 07/12/2006 15:09:08;U;;foobar:4.00
> 07/12/2006 15:09:08;P;;radi:4.00,foobar:4.00,hoobar:6.00 
> 
> Here is a patch against 2.1.1. 
> 
> There are two other changes in here that you probably don't want. 
> 
> 1. The abwritelog command with the associated abwritelog in 
> Libcmds. This is used for runtime logging in our ETL 
> environment and very specific to our ETL tool. 
> 2. Since we run the ETL GUI interactively through qsub -I, I 
> messed with qsub, in order to not require qsub -I to be run 
> from a terminal. 
> 
> There are a few bad things about the token implementation: 
> 
> 1. The location of the accouting directory is hard coded. 
> Since the fifo scheduler appears to be using a slightly 
> different make system than the rest of Torque I couldn't 
> figure out how the make the same configuration mechanism that 
> is used in the server work for the fifo scheduler. 
> 2. The definition of the additional server and job attributes 
> should probably be moved from the *site*.ht files to the main 
> .h files to be consistent 3. To get the new files to compile 
> I messed with the Makefile.in files that are distributed, 
> rather than figuring out how to integrate it into autoconf. 
> 
> Let me know if you have any interest and if I can answer any 
> questions. 
> 
> 
> 
> 
> Jonas Berlin Ph. D. 
> Chief Architect
> Product & Systems Development
> Harte-Hanks
> 25 Linnell Circle
> Billerica, MA 01821
> USA
> Phone +1-978-436-2818
> Mobile +1-508-361-5921
> Fax +1-978-439-3940
> jberlin at hartehanks.com 
> 
>  <http://www.getfirefox.com/> 
> 


More information about the torquedev mailing list