[torqueusers] hostbased ssh mini-howto

widyono at seas.upenn.edu widyono at seas.upenn.edu
Thu Nov 3 10:38:03 MST 2005


Greetings all,

Here is a summary (e.g. a mini-HOWTO that hasn't been cleaned up) of using
OpenSSH as process transport on Linux clusters under Torque.  Hopefully it
will help others.  I use Fermi Scientific Linux 4, YMMV.  This is much more
than a couple of paragraphs but it may fill out some dark areas for
sysadmin-averse cluster operators.

Regards,
Dan Widyono
Liniac Project
University of Pennsylvania


============================================================================

There are two general methods to use.  One will not be discussed here fully
but rather mentioned in passing, which is that each user gets an
empty-passphrase key, which is then copied into their authorized_keys file.
We used this for several years, and while it certainly works, it is awfully
ugly to manage.  Tip: use id_rsa_pbs as the key name so as not to interfere
with users who have their own ssh keys set up (for external connections).

In /etc/ssh/ssh_config use something like this:

Host node*
	IdentityFile ~/.ssh/id_rsa_pbs

============================================================================

We are moving toward the second method, hostbased authentication, but this
was initially set back by awful debugging output from openssh and poor
existing documentation on the web.  I finally bit the bullet and organized my
thoughts and tested a configuration on a test cluster, and ask for your
comments and feedback (*ESPECIALLY* regarding compression and cipher and how
they affect your throughput and latency with non-MPI but intercommunicating
tasks).

============================================================================

Hostbased ssh setup, with torque access control and minor performance tweaks:


On the SSHD Server side (which means everywhere, BUT!!! head node with
external logins should have more secure sshd_config):

	/etc/ssh/shosts.equiv
		headnode.internal.domain
		node1.internal.domain
		node2.internal.domain
		...

	/etc/ssh/sshd_config  (((  ON INTERNAL NODES ONLY!!!  )))
		# Safety valve (root)
		PubkeyAuthentication		yes
		# Main component
		HostbasedAuthentication		yes
		# /etc/pbs_sshauth with pam_listfile.so (see below)
		UsePAM				yes
		# Security measures
		IgnoreUserKnownHosts		yes
		IgnoreRhosts			yes
		PermitUserEnvironment		no
		UseLogin			no
		PermitRootLogin			without-password
		# Reduce latency for MPI
		LogLevel			ERROR
		Ciphers				blowfish-cbc
		Compression			no
		Protocol			2
		# You might want to change the following on the head
		# node, depending on your external network environment
		# and group preferences
		ChallengeResponseAuthentication	no
		PasswordAuthentication		no
		KerberosAuthentication		no
		GSSAPIAuthentication		no
		UseDNS				no
		PrintMotd			no
		PrintLastLog			no
		X11Forwarding			no
		# on head node this really should be yes
		StrictModes			no
		# REMOVE / COMMENT OUT SFTP SUBSYSTEM ON COMPUTE NODES
		# Subsystem       sftp    /usr/libexec/openssh/sftp-server

	/etc/sysconfig/sshd
		# Turn off IPV6 addresses
		OPTIONS="-4"

	/etc/pam.d/sshd  (modified to use pam_listfile.so for access control)
		#%PAM-1.0
		# obviously on compute nodes only
		auth       required     pam_stack.so service=system-auth
		auth       required     pam_nologin.so
		account    required     pam_stack.so service=system-auth
		account    sufficient   pam_access.so
		account    required     pam_listfile.so file=/etc/pbs_sshauth onerr=fail sense=allow item=user
		password   required     pam_stack.so service=system-auth
		session    required     pam_stack.so service=system-auth
		#
		# original, for sake of comparison
		#auth       required     pam_stack.so service=system-auth
		#auth       required     pam_nologin.so
		#account    required     pam_stack.so service=system-auth
		#password   required     pam_stack.so service=system-auth
		#session    required     pam_stack.so service=system-auth


	$PBS_DIR/mom_priv/prologue   AND   prologue.parallel
		#!/bin/sh
		# obviously on compute nodes only
		/bin/rm -f /etc/pbs_sshauth ; echo $2 > /etc/pbs_sshauth ; exit 0

	$PBS_DIR/mom_priv/epilogue   AND   epilogue.parallel
		#!/bin/sh
		# obviously on compute nodes only
		/bin/rm -f /etc/pbs_sshauth ; echo "" > /etc/pbs_sshauth ; exit 0



On the SSH Client side (everywhere):

	/etc/ssh/ssh_config
		FallBackToRsh			no
		EnableSSHKeysign		yes
		Host	node*,headnode.internal.domain,headnode
			BatchMode			yes
			ConnectionAttempts		5
			ForwardX11			no
			HostbasedAuthentication		yes
			PreferredAuthentications	hostbased
			CheckHostIP			no
			UserKnownHostsFile		/dev/null
			Ciphers				blowfish-cbc
			Compression			no



Maintenance: shosts.equiv needs to be updated when new nodes are added.  You
could use netgroups for this, either NIS or a netgroup file (not tested by
myself, but I've read others doing so on Linux).  Probably you want to add
something at bootup to clear out /etc/pbs_sshauth.  Cipher/compression tweaks
as improvements come into existence, for performance gains.


More information about the torqueusers mailing list