[torqueusers] Hardening of Opteron Scyld GigE Cluster
jbernstein at penguincomputing.com
Thu Oct 25 15:25:53 MDT 2007
Hi Gordon, always nice to see a new customer on the list!
> I'm getting : Opteron Scyld GigE Cluster from Penguin
> Running Scyld and Torque.
> When I read about it, it says that all the messaging is done via demons
> _server to _mom, so there is no need to worry about things like rsh or scp
> being needed.
> But then I see things like this, in archives:
> .....> It appears to be some kind of permissions error
> My guess would have to be that PBS is trying to copy those jobs back via rcp
> or scp and that side of things hasn't been set up correctly..
> Certainly with our Torque builds I always use:
> ./configure --with-scp
> to make sure it doesn't try and use rcp (even though rcp is just a symlink
> to scp on our boxes).
> As part of hardening, like the folks above, I get rid of things like rcp.
> And I make sure the net parameters can't do forward and redirect, along with
> many other things.
As of Scyld ClusterWare 4.1.4, (which your cluster will likely ship
with,) TORQUE is configured in a tradition way. Which means that a
pbs_mom is running on each and every compute node in the cluster.
For non-MPI jobs, TORQUE simply asks the mom on the compute node
assigned to the job, to fork the job and execute on the node.
For MPI jobs linked against the MPICH libraries the come with Scyld, RSH
and other associated commands aren't used.
In fact, RSH is disabled by default on Scyld clusters. You only need to
enable it for applications that absolutely depend on it.
> But I'm worried I'll break the "communication" between server and compute
Understand that users do not login to Scyld compute nodes, instead they
launch jobs either through TORQUE or via using the Scyld commands such
as bpsh and beorun.
If you have any questions after you receive your cluster, please don't
hesitate to contact Penguin's tech support team which is support at
More information about the torqueusers