[torqueusers] how is the torque renewal scripts supposed to work?
Mike.Coyne at PACCAR.com
Fri Jul 9 09:37:58 MDT 2010
As i had mentioned in a previous reply i use russ alberts kstart package http://www.eyrie.org/~eagle/software/kstart/ "or a slight derivision of it" to manage my kerberos server and client "job" tickets. with the gssapi branch every thing need a ticket to comminicate yp the server's and mom. If you say set Your KRB5CCNAME=file://my-server-credpath and they use k5start to run pbs-server setting the credtaial cash to be the same and say getting a ticket from your krb5.keytab , say the host/my.full.cred.name at MY-REALM . this name needs to be a admin in pbs-server w/"qmgr". same goes for pbs-mom. You only need on k5start keeping your ticket cache current, so you can just say fire up mom with the same KRB5CCNAME in her environment as the pbsserver you just started on the same machine. same goes for maui ... ;) As for the jobs when a user dose a qsub there ticket is used to authenicate to pbs-server and is stored in the ....server-priv/creds directory. there is a built in task in the pbs-server that will try to do a ticket renew every 3 hours i believe. when the job runs it uses the credentials from the creds directory to authicate to the mom on the compute node. If the gssapi auth susceeds a copy of the credentials is delagated for the user to use , and if will try to set your AFS pag and get your AFS token. at that point it starts the job . If the user managed to authincate to the pbs server say with pbs-iff then job will never run as no credential will be available to authincate to mom and the job will set in queue. In that sense pbs-iff is Not your friend, in the large patch i had sent out several weeks ago i set pbs-iff to only work when your pbs-server is running -t create ... and then rejected after that so you can conifigure things up say as root the first time. i digress
after the job starts its credential is say stored under /tmp/myjob...krbcache... you need to keep the credential cache fresh for the user, or the user has to do in there script. The default gssapi version leaves it up to your here. I chose to maintain the cache for the user , in order to do this i started up the job by pre-pending krenew to the command string. It gets a little tricky to do this adding the krenew options to the command line then fireing up the job shell with all its options ... ie /usr/bin/krenew -krenopt.... -- /bin/sh myjob.sh . I optted on making a $babysiter option to prepend the /usr/bin/kernew to the command line , and in my "special" build of krenew i just pulled the options from the environment rather than the command line so things stayed simple.
now to run something with the TM interface it gets a little more interesting , the only place the users creds normaly are available are on the "mother superior node" , so to fire off a task on a nother node with the user credential say opening up AFS for the user , the credentials need to be put on the node first, as the TM interface currently dosent allow for the same gssapi auth as pbs-server used to send the job to the mother-superior. So i added a loop prior to sending the job to mother-superior to do a auth to each node in the node list , leaving a copy of the user creds availbe to the user on each selected node , then when the user accessed the note it then sets the users PAG and gets his AFS token. prior to running the users command.
again i digress i hope this help explain some of the credential stuff going on ... One thing that the credital update mechinism gets confused when you restart the pbs-server so sometimes you need clean up the creds and/or copy a good cred in for a user which may have expired... and cleanup is not so good could use a better clean up after the job finishes ... ie rm /tmp/usercreds... on all the compute nodes after the job completes and the results have been returned , as the scp will need to use the user pass through credentials .\
From: torqueusers-bounces at supercluster.org on behalf of Andreas Davour
Sent: Fri 7/9/2010 6:37 AM
To: torqueusers at supercluster.org
Subject: [torqueusers] how is the torque renewal scripts supposed to work?
After the problems I posted yesterday I think it's clear that I have a very
vague idea of hwo things are supposed to work.
So, how, when and where are the client and server renewal scripts distributed
with the gssapi branch really supposed to be run?
PDC Center for High Performance Computing
CSC School of Computer Science and Communication
KTH Royal Institute of Technology
SE-100 44 Stockholm, Sweden
"A satellite, an earring, and a dust bunny are what made America great!"
torqueusers mailing list
torqueusers at supercluster.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers