[torqueusers] Directly linked nodes via cross crossover cable

Garrick Staples garrick at usc.edu
Sun Feb 19 23:35:31 MST 2006


On Mon, Feb 13, 2006 at 08:26:20AM -0800, Aaron Greenwood alleged:
> 
> Consider the following hardware configuration:
> 
> NODE 1 (2 CPUS)
> eth0 - Connected to cluster Ethernet switch.
> eth1 - Directly linked via cross crossover cable to NODE 2
> 
> NODE 2 (2 CPUS)
> eth0 - Connected to cluster Ethernet switch.
> eth1 - Directly linked via cross crossover cable to NODE 1
> 
> Is it possible to configure PBS in such a way that a parallel job
> submitted from the head node will use all CPUS on NODE 1 and NODE 2
> running over the Ethernet cards that are directly linked?

Not directly, no.

 
> The directly linked cards are on a private network listed in the local
> hosts file.
> 
> I talked with a guy who does this. He said that in the script that he
> submits his jobs he modifies the machine_file as in lamboot -s
> machine_file. When I do that the jobs run using the Ethernet cards
> connected to the cluster switch. I checked this by logging on to both of
> the nodes and checking traffic with tcpdump and running lamnodes.

Exactly what that guy said.  PBS passes the list of nodenames to a job
by putting them in the filename in $PBS_NODEFILE.  Your job would simply
make a local copy of $PBS_NODEFILE, transforming the hostnames to match
that of the directly linked interfaces.

For example, if $PBS_NODEFILE had "node01" and "node02", which refer to
the switched interfaces, and "node01-direct" and "node02-direct" refer
to the direct interfaces, your job could do something simple like:
  sed 's/$/-direct/' < $PBS_NODEFILE > /tmp/machine_file
And then use /tmp/machine_file with lamboot.

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20060219/398a4e14/attachment.bin


More information about the torqueusers mailing list