[torquedev] Trunk And Multithreading

Ken Nielson knielson at adaptivecomputing.com
Fri Dec 10 12:40:18 MST 2010

On 12/10/2010 10:11 AM, Glen Beane wrote:
> I like the model we used for the major job-array changes where they
> were developed in a branch and folded back in when they were stable,
> but if TORQUE 4.0 will be delayed until multi-threading is rock solid
> then I can live with it in trunk.  I think rushing this will be a big
> mistake, and I'm sure you have no such plans, but sometimes there are
> pressures to get a release out the door before it is really ready...
> I'd like to think a little about the resizable array and out of order
> jobs issue, although that is minor at this point - stability is
> definitely higher priority.
> _______________________________________________

Thank you for your comments. We understand the words of caution from you 
and Simon. We are also conscious of the fact that we are moving forward 
with a lot of changes and not always getting input from the community. 
There is a bit of urgency to get TORQUE to the point it can scale. SLURM 
is moving forward with their scaling ability in response to users who 
are currently creating clusters with 10,000 plus nodes and multiple cores.

We are hearing of plans to build systems with over 100,000 nodes and 
right now TORQUE cannot manage such a system. I have published on this 
list and at SC'10 what we plan for TORQUE 4.0 (3.1 in my SC'10 
presentation). We are 1)making TORQUE mulit-threaded, 2)we are adding a 
hierarchical job launch and 3)we will be changing the way Server-to-MOM 
and MOM-to-MOM communication works. Any and all ideas about how to 
improve these are welcomed and encouraged.

We have chosen to put this work into trunk with the knowledge that it 
will create instability. But we also have confidence that we will be 
able to address the stability problems as the new version is deployed. 
In the mean time we have the 2.4, 2.5 and 3.0 branches which are 
available for use. 2.5 and 3.0 can also be improved with minor feature 
changes as well.

Contributions to the code and conversation are encouraged. We want to be 
open while at the same time move ahead with the changes needed to keep 
TORQUE relevant as a resource manager.

Thanks again for your comments.



More information about the torquedev mailing list