[torquedev] [torqueusers] TORQUE 3.0.0 Released

Ken Nielson knielson at adaptivecomputing.com
Wed Dec 8 10:34:38 MST 2010


On 12/08/2010 08:14 AM, Lloyd Brown wrote:
> On 12/6/10 4:38 PM, Ken Nielson wrote:
>    
>> This version of TORQUE can be built to work the same as other versions
>> of TORQUE without the NUMA option. However, we would recommend the use
>> of TORQUE 2.5.3 if you do not require NUMA capability.
>>      
> Ken, et al.,
>
> Can you be more specific on this recommendation?  We're looking at
> upgrading from 2.4.x to 2.5.x during our downtime in early January, and
> due to the communication protocol change from 2.x to 3.x, we're
> wondering about upgrading all the way to 3.0.x, to make future rolling
> upgrades easier.  We don't have a big Altix or anything, but just a few
> hundred x86_64 Linux servers.  Is there a specific concern about 3.0.x
> on non-NUMA clusters?  Are there outstanding, known issues?
>
> Lloyd
>
>    
I would recommend going to 2.5.x if you do not need the NUMA support. 
Like any .0 release there are several code changes which have inherent 
possibilities for problems. The 2.5.x code is more stable and tested 
than the 3.0.x branch for non-NUMA functionality.  While all of the 
2.5.x capability is in 3.0.x you will probably find more stability with 
2.5.x.

The best way to answer the rest of your questions is to address the 
TORQUE road map.

The next release of 2.4-fixes will be 2.4.12. 2.4-fixes will continue to 
be the stable branch for TORQUE. By stable we mean there will be no new 
features added to 2.4-fixes. Only bug fixes.

2.5-fixes is becoming more stable and we would recommend moving to the 
latest 2.5 release when you are ready. 2.5-fixes will continue to 
receive new features along with bug fixes. We do not call this the 
stable branch simply because we may add  feature changes. However, the 
code base itself has proven to be pretty reliable.

We will be adding GPU support starting with 2.5.4 which we hope to 
release this month along with Moab 6. Moab 6 also has GPU support that 
will work with TORQUE.

In March we hope to release TORQUE 3.1.

The major thrust of TORQUE 3.1 is scalability. Some of the things we 
will be doing to improve scalability are as follows:

    * Create a multi-threaded TORQUE
    * Use a hierarchical job launch (job radix)
    * Improve mom-to-mom and mom-to-server communications to reduce
      traffic needed to keep the server and moms up to date on the state
      of the cluster

Because of the chattiness of TORQUE in doing updates from the mom's to 
the server we may need to change how this is done. This may make 3.1 
INCOMPATIBLE with all previous versions of TORQUE. So moving to 3.0.0 to 
make upgrading easier to 3.1 may not make a difference.

Trunk currently has changes for multi-threading if anyone wants to check 
it out. Any recommendations for improvements or reports of problems are 
welcomed and encouraged.

To summarize, we recommend that unless you need NUMA support that you 
continue to use or upgrade to version 2.5.x. When version 3.1.0 is 
released we will start encouraging all users to upgrade as it becomes 
more stable.

Let me know if you have more questions.

Ken
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torquedev/attachments/20101208/758e8163/attachment.html 


More information about the torquedev mailing list