[torquedev] renewing credentials

Sergio Gelato Sergio.Gelato at astro.su.se
Wed Mar 7 14:51:01 MST 2007


* Garrick Staples [2007-03-06 13:39:49 -0700]:
> On Tue, Mar 06, 2007 at 09:14:57PM +0100, Sergio Gelato alleged:
> > I don't think so. It's quite easy for a job to do a
> > 	(while kinit -Rf; do sleep 30000; done) &
> > or equivalent (e.g., Russ Allbery's krenew) on each node. Indeed it would 
> > be nice for pbs_mom to set that up on the user's behalf and to clean up at 
> > the end of the job. Isn't this what the prologue and epilogue scripts
> > are for?
> 
> I thought the pro/epilog bits were no longer necessary.  When the gssapi
> patch was originally submitted, I was the one that rejected the idea of
> pro/epilog scripts managing the key renewals.
> 
> I had thought the pbs_mom bits required to handle this were already in
> checked in to the gssapi branch.

Maybe I'm misreading the code, but my impression is that the only
renewals at the moment are done while the job sits waiting in the queue.
Specifically, the only call to pbsgss_renew_creds() is from
renew_job_credentials(), which is only mentioned in req_quejob() as
 set_task(WORK_Timed,time((time_t *)0)+3600*3,renew_job_credentials,jobidcopy)
and this appears in an #ifndef PBS_MOM block.

There is no question that credentials must be periodically refreshed
by pbs_server while the job is queued. (This, I believe, is happening.
In an ugly way, with hard-coded calls to kinit and a fixed 3-hour refresh 
rate, but we can tidy that up later on.) Once the job starts executing, 
however, it could in principle take over that responsibility. Which
doesn't mean that it should...

At least for the kinds of credentials I'm familiar with, the refreshing
doesn't require superuser privileges. And where AFS is involved it needs
to happen in the same PAG as the user's job. Which is the better place to
do it: pbs_mom, or a separate daemon (say, krenew) that's launched
by prologue.user and terminated by epilogue.user?

OK, so req_cpyfile() may need the credentials at the end of the job in
order to copy the output files. At the moment it reuses the job's
ccache and sets up a new AFS PAG for itself. Since it reuses the job's
ccache it doesn't really care how the credentials have been refreshed.
That doesn't help much in answering my question.


More information about the torquedev mailing list