[torqueusers] Directly linked nodes via cross crossover cable
Garrick Staples
garrick at usc.edu
Sun Feb 19 23:35:31 MST 2006
On Mon, Feb 13, 2006 at 08:26:20AM -0800, Aaron Greenwood alleged:
>
> Consider the following hardware configuration:
>
> NODE 1 (2 CPUS)
> eth0 - Connected to cluster Ethernet switch.
> eth1 - Directly linked via cross crossover cable to NODE 2
>
> NODE 2 (2 CPUS)
> eth0 - Connected to cluster Ethernet switch.
> eth1 - Directly linked via cross crossover cable to NODE 1
>
> Is it possible to configure PBS in such a way that a parallel job
> submitted from the head node will use all CPUS on NODE 1 and NODE 2
> running over the Ethernet cards that are directly linked?
Not directly, no.
> The directly linked cards are on a private network listed in the local
> hosts file.
>
> I talked with a guy who does this. He said that in the script that he
> submits his jobs he modifies the machine_file as in lamboot -s
> machine_file. When I do that the jobs run using the Ethernet cards
> connected to the cluster switch. I checked this by logging on to both of
> the nodes and checking traffic with tcpdump and running lamnodes.
Exactly what that guy said. PBS passes the list of nodenames to a job
by putting them in the filename in $PBS_NODEFILE. Your job would simply
make a local copy of $PBS_NODEFILE, transforming the hostnames to match
that of the directly linked interfaces.
For example, if $PBS_NODEFILE had "node01" and "node02", which refer to
the switched interfaces, and "node01-direct" and "node02-direct" refer
to the direct interfaces, your job could do something simple like:
sed 's/$/-direct/' < $PBS_NODEFILE > /tmp/machine_file
And then use /tmp/machine_file with lamboot.
--
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20060219/398a4e14/attachment.bin
More information about the torqueusers
mailing list