[torqueusers] All *.ER and *.OU stucked in undelivered

Jacqueline Scoggins jscoggins at lbl.gov
Mon Mar 20 16:59:19 MST 2006


One of the things I found that could cause this is the trust
relationship between the hosts in the config file.  Do you have the
entry on each node to trust your network which scp or rcp can be used
on.

Check the configure file:

i.e. 
$restricted  <ip to the pbs_server>
$clienthost <ip to the pbs_server>
$clienthost <ip to the nfs mounted filesystem>

Also is your home cross mounted on each of the nodes. Then create a
.shosts file and allow the nodes to communicate without having to setup
the authorized keys file.  Use it like the RSA authentication method
with rcp.  If your nodes are on a private subnet this should not be an
issue with security.  Also if you want to add the authorized keys file
like suggested below I think you want to do it differently.  You are not
having trouble with the master talking to the nodes but the nodes
talking to the master host.  So the master does not trust the nodes.  I
would need to see how your home directory is setup. 

If you have a home on each node with different password entries then you
will need to run ssh-keygen on each node and set up the master nodes
.ssh/authorized_keys file to include the new key from each node.  This
seems like to much and I think the .shosts would be better.  

Thanks

Jackie

On Mon, 2006-03-20 at 15:02, Josephine Palencia wrote:
> Hi Hristo,
> 
> 
> One place to look is */mom/config of each of your node.
> 
> You can look at /var/spool/pbs/mom/config or /usr/spool/pbs/mom/config of
> each of your nodes and add for instance
> 
> $usecp *:/home /home
> 
> 
> josephine
> 
> 
> On Mon, 20 Mar 2006, Hristo Iliev wrote:
> 
> > On Tue, 2006-03-14 at 12:47 +0100, Torsten Bruhn wrote:
> > > Hallo,
> > >
> > > I am new to Torque and maui and tried to setup them on our cluster.
> > > Torque is installed with the -with-scp option and scp is possible in
> > > both directions. Jobs in the queue start normally and finish normally
> > > but the *.ER- and *.OU-files are not copied in the users directory
> > > but get stucked in the undelivered directory. There are no error
> > > messages in the log-files and no mails with an error-message and I
> > > have no clue what is wrong, perhaps some here can help?
> > >
> > > Greets,
> > > --
> > > Dipl.-Chem. Torsten Bruhn
> >
> >
> > Hello,
> >
> > do you use public keys for SSH or do you enter a password each time?
> > In order for scp to work unattended you need to use public key
> > authentication with *empty* key passphrase.
> >
> > You can generate your public key by executing the following command:
> >
> > ssh-keygen -t dsa -b 1024
> > (just hit Enter when asked for passphrase)
> >
> > Then you will get two files in the .ssh subdirectory of your home dir:
> > id_dsa (keep this file in safety - this is your secret key)
> > id_dsa.pub (your public key file)
> >
> > Now all you need to do is to append the content of id_dsa.pub to the end
> > of the authorized_keys file (found once again in the .ssh subdir) on
> > each computing node:
> >
> > (execute the following commands from your .ssh subdir)
> >
> > cat id_dsa.pub >> authorized_keys
> > cat id_dsa.pub | ssh login at hostname "cat - >> ~/.ssh/authorized_keys"
> > (substitute login with your login and hostname with the name of each
> > computing node. you can of course transfer id_dsa.pub to the remote
> > hosts, login there and cat its content to authorized_keys)
> >
> > Now you have to SSH login from each of your compute nodes back to the
> > machine from which you submit your job files and accept the server key
> > fingerprint. The reason for doing so is that scp will ask you to confirm
> > that you trust the SSH server fingerprint the first time you connect to
> > the server from your compute nodes. You can confirm that everything is
> > OK when the following command "scp somefile mainnode:~/" completes
> > without asking you for password, passphrase or fingerprint trust.
> >
> > In my opinion it is easier to setup NFS shared home folders and to use
> > NIS for central administration of user accounts. Then Torque can use
> > simple "cp" to transfer files back and forth.
> >
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
> >
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list