[torquedev] [torqueusers] 3.0-alpha branch added to TORQUE subversion tree
Ken Nielson
knielson at adaptivecomputing.com
Thu Apr 22 13:47:48 MDT 2010
Garrick Staples wrote:
> On Thu, Apr 22, 2010 at 12:17:35PM -0600, Ken Nielson alleged:
>
>> Let me know if you have questions. The code does run. We were able to
>> get it to work on a 3000 plus node cluster. But I am sure there is much
>> to flesh out.
>>
>
> That's very cool that it is already running! Congrats!
>
> I assume the final design won't leave it up to the user, right? It should just
> be a server-wide config that is set once (or automatically by).
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> torquedev mailing list
> torquedev at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torquedev
>
Garrick,
I look forward to everyone's input on this question. The main reason it
is not currently the default is that in design we did not know how it
would behave. We also did not want to change the behavior that users
have come to expect. However, since this is something that is in the
background users do not really know or probably care how MOMs communicate.
We have found that different sized radix (or is that radi) give slightly
different results. But overall you can just pick a number. I foresee
that there will be a default value, a system wide value, and then a per
job value submitted with qsub. With a radix value as the default we can
enable MOMs to use TCP for communication. We will likely still need UDP
for mom to server communication. Better yet would be a scheme that gets
mom configuration information to the server with less traffic. I am open
to any ideas.
Ken
More information about the torquedev
mailing list