[torquedev] Other batch systems and Kerberos (was Fwd: Re: [Beowulf] network filesystem)

Sergio Gelato Sergio.Gelato at astro.su.se
Tue Mar 6 02:21:17 MST 2007


* Chris Samuel [2007-03-05 12:25:59 +1100]:
> Hi folks,
> 
> Given the recent discussion about GSSAPI I thought the following from the 
> Beowulf list with a reference to Kerberos support in Load Leveller might be 
> of interest.

It doesn't say much. (And if one looks at the rest of that thread, one
sees that the focus is on small clusters and the best filesystem for
them. I don't doubt that *some* clusters can do quite well with just
NFSv3 and hostbased authentication.) The CERN document is from ten years
ago; a lot has happened in the AFS world since then, most notably in
Kerberos 5 support.

One should point out that GSSAPI is about more than just Kerberos.
Globus GSI comes to mind here, and it's definitely relevant to batch
systems.

Some of us have an AFS infrastructure; others may want to someday take
advantage of the enhanced security of NFSv4. In either case, one needs
a way for batch jobs to have network credentials. (The current torque
gssapi branch already provides this, at least for Kerberos 5 and AFS.
It's not quite as general or as secure as one might wish, but it's usable.)

Some of us have Internet-facing clusters. (At least some of our network/DNS 
people here would like to see RFC1918 addresses go away entirely.) And some
of our users will find it convenient to be able to submit batch jobs
directly from their laptops. This all argues for more robust authentication
than "trust the client and the network". GSSAPI is one way to provide this
(in a pluggable fashion: your GSSAPI library may support a variety of
mechanisms).

The solution to limited Kerberos ticket lifetimes is well-known, and
involves renewable tickets. (Essentially, the ticket lifetime determines
how often one must generate a new session key while the renewable lifetime 
determines for how long one can go on doing so. The former should not exceed 
a few hours, the latter can be months.) The job server needs either to
periodically renew tickets for jobs in the queue, or to be able to acquire
fresh ones when a job is started.

I haven't looked at LL (or Condor) recently so I won't comment on that.
I did look at the GSSAPI support in Sun Grid Engine 6.0, and found
it rather too limited for my taste (and it seemed difficult to extend). 
That's why I came back to TORQUE.

> cheers,
> Chris
> 
> ----------  Forwarded Message  ----------
> 
> Subject: Re: [Beowulf] network filesystem
> Date: Mon, 5 Mar 2007
> From: John Hearns <john.hearns at streamline-computing.com>
> To: Chris Samuel <csamuel at vpac.org>
> 
> Chris Samuel wrote:
> > On Mon, 5 Mar 2007, Mark Hahn wrote:
> > 
> >> why V4?
> >> - security.  within a cluster, I don't see the point to, say, kerberos.
> > 
> > Agreed, not to mention all the pain of trying to get Kerberos tickets passed 
> > through the queueing system and the fact that if you're running a 3 month 
> job 
> > it's going to be quite hard to persuade your Kerberos admin to let you be 
> > able to create a ticket that lasts that long..
> > 
> Purely as a point of interest, since high energy physics labs use AFS 
> (and hence kerberos) they have already faced this one.
> The ticket is extended when a batch job is submitted:
> http://services.web.cern.ch/services/afs/arc.html#SECTION00040000000000000000
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf
> 
> -------------------------------------------------------
> 
> -- 
>  Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager
>  Victorian Partnership for Advanced Computing http://www.vpac.org/
>  Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia
> 
> 


More information about the torquedev mailing list