[torqueusers] bad torque nodes file?

Albert Everett aeeverett at ualr.edu
Thu Mar 19 07:48:57 MDT 2009


Rocks 4.3 cluster, pbs roll, moab roll

Reading docs at

http://www.clusterresources.com/torquedocs21/nodeconfig.shtml

Trying to get IPoIB working with Star-P and torque. IPoIB network is  
192.168.2.0/24, device is ib0. ib0 interfaces are registered in the  
Rocks database and show up in dns and /etc/hosts.

When running outside torque/moab, *p is happy with a machines file with
lines like:

compute-1-1 ifhn=192.168.2.254

Using normal torque nodes file /opt/torque/server_priv/nodes:

[root at hpc1-cpsc server_priv]# head nodes
compute-1-1.local np=8
compute-1-2.local np=8
compute-1-3.local np=8
compute-1-4.local np=8
compute-1-5.local np=8
compute-1-6.local np=8
compute-1-7.local np=8
compute-1-8.local np=8
compute-1-9.local np=8
compute-1-10.local np=8
...

starp uses only eth0 interfaces for MPI. Thought I might tweak the
torque nodes file to look like:

[root at hpc1-cpsc server_priv]# head nodes-ifhn
compute-1-1.local np=8 ifhn=192.168.2.254
compute-1-2.local np=8 ifhn=192.168.2.253
compute-1-3.local np=8 ifhn=192.168.2.252
compute-1-4.local np=8 ifhn=192.168.2.251
compute-1-5.local np=8 ifhn=192.168.2.250
compute-1-6.local np=8 ifhn=192.168.2.249
compute-1-7.local np=8 ifhn=192.168.2.248
compute-1-8.local np=8 ifhn=192.168.2.247
compute-1-9.local np=8 ifhn=192.168.2.246
compute-1-10.local np=8 ifhn=192.168.2.245
...

and restart the pbs_server service.

The pbs_server service doesn't like the ifhn string in the nodes file,
so I put the original back in place and pbs_server is now happy.

Any idea why pbs_server barfs on the nodes file with ifhn= in it?


More information about the torqueusers mailing list