[torquedev] [torqueusers] 3.0-alpha branch added to TORQUE subversion tree

Ken Nielson knielson at adaptivecomputing.com
Thu Apr 22 13:47:48 MDT 2010


Garrick Staples wrote:
> On Thu, Apr 22, 2010 at 12:17:35PM -0600, Ken Nielson alleged:
>   
>> Let me know if you have questions. The code does run. We were able to 
>> get it to work on a 3000 plus node cluster. But I am sure there is much 
>> to flesh out.
>>     
>
> That's very cool that it is already running! Congrats!
>
> I assume the final design won't leave it up to the user, right?  It should just
> be a server-wide config that is set once (or automatically by).
>
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> torquedev mailing list
> torquedev at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torquedev
>   
Garrick,

I look forward to everyone's input on this question. The main reason it 
is not currently the default is that in design we did not know how it 
would behave. We also did not want to change the behavior that users 
have come to expect. However, since this is something that is in the 
background users do not really know or probably care how MOMs communicate.

We have found that different sized radix (or is that radi) give slightly 
different results. But overall you can just pick a number. I foresee 
that there will be a default value, a system wide value, and then a per 
job value submitted with qsub. With a radix value as the default we can 
enable MOMs to use TCP for communication. We will likely still need UDP 
for mom to server communication. Better yet would be a scheme that gets 
mom configuration information to the server with less traffic. I am open 
to any ideas.

Ken


More information about the torquedev mailing list