Bug 92 - Enhancement to enable dynamic tokens
: Enhancement to enable dynamic tokens
Status: NEW
Product: TORQUE
pbs_sched
: 2.5.x
: All All
: P5 enhancement
Assigned To: Glen
:
:
:
  Show dependency treegraph
 
Reported: 2010-10-15 14:28 MDT by joshua.weage
Modified: 2010-10-16 04:01 MDT (History)
2 users (show)

See Also:


Attachments
Dynamic token patches (4.24 KB, application/x-compressed)
2010-10-15 14:28 MDT, joshua.weage
Details


Note

You need to log in before you can comment on or make changes to this bug.


Description joshua.weage 2010-10-15 14:28:41 MDT
Created an attachment (id=59) [details]
Dynamic token patches

These patches enable the C FIFO scheduler properly schedule jobs which require
network based licenses, such as flexlm, which may also be used by software not
running under the control of Torque.

This was implemented using the existing tokens capability in the server and C
FIFO scheduler.  This has been in production on one cluster for 18 months with
Torque 2.3.6.  I recently updated the patches for 2.5.2.

There are three patches.  The first changes the token limit to 1000000 using a
#defined value in pbs_server.  The second contains all of the necessary changes
to the scheduler.  The third adds documentation for the tokens resource and
server attribute to the man pages.

Ideally this should probably be re-written to enable the C FIFO scheduler to
understand generic resources; however, since the token capability was already
there, I just extended it.
Comment 1 Simon Toth 2010-10-16 03:31:11 MDT
(In reply to comment #0)
> Created an attachment (id=59) [details] [details]
> Dynamic token patches
> 
> These patches enable the C FIFO scheduler properly schedule jobs which require
> network based licenses, such as flexlm, which may also be used by software not
> running under the control of Torque.
> 
> This was implemented using the existing tokens capability in the server and C
> FIFO scheduler.  This has been in production on one cluster for 18 months with
> Torque 2.3.6.  I recently updated the patches for 2.5.2.
> 
> There are three patches.  The first changes the token limit to 1000000 using a
> #defined value in pbs_server.  The second contains all of the necessary changes
> to the scheduler.  The third adds documentation for the tokens resource and
> server attribute to the man pages.
> 
> Ideally this should probably be re-written to enable the C FIFO scheduler to
> understand generic resources; however, since the token capability was already
> there, I just extended it.

This is interesting. We have a fully resource-aware server and scheduler.
Server part is in bugzilla here:
http://www.clusterresources.com/bugzilla/show_bug.cgi?id=67

But we also have a separate external daemon for dynamic resources like
licences. You store licenses dirrectly on server? How do you handle the delay?
Comment 2 joshua.weage 2010-10-16 03:53:23 MDT
(In reply to comment #1)
> (In reply to comment #0)
> This is interesting. We have a fully resource-aware server and scheduler.
> Server part is in bugzilla here:
> http://www.clusterresources.com/bugzilla/show_bug.cgi?id=67
> 
> But we also have a separate external daemon for dynamic resources like
> licences. You store licenses dirrectly on server? How do you handle the delay?

As I have this implemented, in the case of a "dynamic token" the scheduler runs
an external script to obtain the current number of available tokens/licenses
rather than using the value specified on the server.  I also added a "dynamic
token delay" parameter to the scheduler which causes subsequent jobs which are
requesting the same "dynamic token" to be delayed a specified number of seconds
before being scheduled.  A sufficiently long delay gives the previous job
enough time to start and obtain licenses before the next job needing the same
software licenses will be scheduled.
Comment 3 Simon Toth 2010-10-16 04:01:56 MDT
(In reply to comment #2)
> (In reply to comment #1)
> > (In reply to comment #0)
> > This is interesting. We have a fully resource-aware server and scheduler.
> > Server part is in bugzilla here:
> > http://www.clusterresources.com/bugzilla/show_bug.cgi?id=67
> > 
> > But we also have a separate external daemon for dynamic resources like
> > licences. You store licenses dirrectly on server? How do you handle the delay?
> 
> As I have this implemented, in the case of a "dynamic token" the scheduler runs
> an external script to obtain the current number of available tokens/licenses
> rather than using the value specified on the server.  I also added a "dynamic
> token delay" parameter to the scheduler which causes subsequent jobs which are
> requesting the same "dynamic token" to be delayed a specified number of seconds
> before being scheduled.  A sufficiently long delay gives the previous job
> enough time to start and obtain licenses before the next job needing the same
> software licenses will be scheduled.

Oh, yeah that is kind of similar, we just have another middle-man that stores
the data and the schedulers reads this info when needed.

Btw if you are interested, our code is here:

http://thrain.ics.muni.cz/git/?p=meta_torque.git;a=summary

http://goo.gl/x2PV is a direct link to our scheduling branch in public git
repository, the scheduler is in scheduler.cc/samples/world (its based of FIFO)