[torqueusers] Transforming node names in $PBS_NODEFILE and $PBS_GPUFILE

Dave Ulrick d-ulrick at comcast.net
Tue Sep 4 13:24:57 MDT 2012


Our 60-node HPC is configured with two local networks: GigE and 
Infiniband. As host names we've defined cnxx (e.g., cn01 or cn60) for the 
GigE IPs and icnxx for the IB IPs. The TORQUE (3.0.4) pbs_server and 
pbs_moms are configured to use the GigE host names so $PBS_NODEFILE and 
$PBS_GPUFILE naturally present the GigE node names.

I am trying to figure out a way to populate these files with the IB node 
names so MPI traffic will use IB instead of GigE. I've already tried to 
reconfigure Moab and TORQUE to use the IB nodes but was unsuccessful. 
After giving the matter more thought, I'm thinking that my users would be 
happiest if they knew that IB bandwidth was being dedicated to their 
apps--MPI, NFS, etc.--as opposed to resource manager overhead, so I'd 
rather not go that route.

I've advised my users to consider adding code to their PBS scripts to 
convert the $PBS_NODEFILE and $PBS_GPUFILE contents as they see fit, but 
they'd rather not have to bother. I've experimented with job-specific 
prologue and epilogue scripts but I've not been successful. Both 
$PBS_NODEFILE and $PBS_GPUFILE are created with 644 permissions and root 
ownership so the script can't write modified files under the same file 
names. The script could of course write modified node files under other 
names, but that wouldn't let them do anything they couldn't do right in 
the PBS script itself.

According to the TORQUE admin manual, the system prologue and 
epilogue scripts are run as root but with empty environments. If this 
means that $PBS_NODEFILE and $PBS_GPUFILE aren't provided to the prologue 
script, I won't be able to transform the files there.

Can you think of any way I could convert the node files so that they will 
be available via the familiar $PBS_NODEFILE and $PBS_GPUFILE environment 
variables, or is my only hope to reconfigure TORQUE and Moab to use the 
icnxx node names?

Dave Ulrick
d-ulrick at comcast.net

