[torqueusers] bad torque nodes file?
Albert Everett
aeeverett at ualr.edu
Thu Mar 19 07:48:57 MDT 2009
Rocks 4.3 cluster, pbs roll, moab roll
Reading docs at
http://www.clusterresources.com/torquedocs21/nodeconfig.shtml
Trying to get IPoIB working with Star-P and torque. IPoIB network is
192.168.2.0/24, device is ib0. ib0 interfaces are registered in the
Rocks database and show up in dns and /etc/hosts.
When running outside torque/moab, *p is happy with a machines file with
lines like:
compute-1-1 ifhn=192.168.2.254
Using normal torque nodes file /opt/torque/server_priv/nodes:
[root at hpc1-cpsc server_priv]# head nodes
compute-1-1.local np=8
compute-1-2.local np=8
compute-1-3.local np=8
compute-1-4.local np=8
compute-1-5.local np=8
compute-1-6.local np=8
compute-1-7.local np=8
compute-1-8.local np=8
compute-1-9.local np=8
compute-1-10.local np=8
...
starp uses only eth0 interfaces for MPI. Thought I might tweak the
torque nodes file to look like:
[root at hpc1-cpsc server_priv]# head nodes-ifhn
compute-1-1.local np=8 ifhn=192.168.2.254
compute-1-2.local np=8 ifhn=192.168.2.253
compute-1-3.local np=8 ifhn=192.168.2.252
compute-1-4.local np=8 ifhn=192.168.2.251
compute-1-5.local np=8 ifhn=192.168.2.250
compute-1-6.local np=8 ifhn=192.168.2.249
compute-1-7.local np=8 ifhn=192.168.2.248
compute-1-8.local np=8 ifhn=192.168.2.247
compute-1-9.local np=8 ifhn=192.168.2.246
compute-1-10.local np=8 ifhn=192.168.2.245
...
and restart the pbs_server service.
The pbs_server service doesn't like the ifhn string in the nodes file,
so I put the original back in place and pbs_server is now happy.
Any idea why pbs_server barfs on the nodes file with ifhn= in it?
More information about the torqueusers
mailing list