[torqueusers] Re: LAM-MPI won't boot with torque-1.2.0p6
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Thu Sep 15 15:07:54 MDT 2005
>>However, on 3 nodes it still fails:
> With the case above working, and this one failing, this implies that
> pbs_mom can't talk to the pbs_demux process. Do you have any kind of
> port filtering in place?
No, iptables of off. Port filtering doesn't seem to make sense
on a private network.
# service iptables status
Firewall is stopped.
Speaking of a pbs_demux process, when would that be started ?
It's not running on the nodes after I start an interactive PBS job.
> Actually, if you configured torque with --enable-syslog you should have
> errors related to open_demux() in your syslog.
Right you are, I see some errors ! On the PBS job master node:
Sep 15 20:58:27 n469 pbs_mom: Connection refused (111) in open_demux,
open_demux: connect 127.0.0.1:34976
Sep 15 21:29:43 n469 pbs_mom: Connection refused (111) in open_demux,
open_demux: connect 127.0.0.1:34987
Sep 15 21:34:17 n469 pbs_mom: Connection refused (111) in open_demux,
open_demux: connect 127.0.0.1:34999
On a slave node:
Sep 15 20:58:33 n478 pbs_mom: Connection refused (111) in open_demux,
open_demux: connect 10.1.130.219:34976
Sep 15 21:29:49 n478 pbs_mom: Connection refused (111) in open_demux,
open_demux: connect 10.1.130.219:34987
Sep 15 21:34:23 n478 pbs_mom: Connection refused (111) in open_demux,
open_demux: connect 10.1.130.219:34999
Here 10.1.130.219 is the IP-address of the job master node, n469.
It seems to me we're getting closer, but what config parameter
would control access to open_demux() ?
FYI, I've installed these Torque RPMs (based somewhat on your
torque.spec file) on the nodes:
# rpm -qa | grep torque
The nodes have the following files installed in /usr/sbin:
# ls -la /usr/sbin/pbs*
-rwxr-xr-x 1 root root 15950 Sep 13 13:46 /usr/sbin/pbs_demux
-rwsr-xr-x 1 root root 85882 Sep 13 13:46 /usr/sbin/pbs_iff
-rwx------ 1 root root 697041 Sep 13 13:46 /usr/sbin/pbs_mom
-rwsr-xr-x 1 root root 36913 Sep 13 13:46 /usr/sbin/pbs_rcp
So pbs_demux is actually installed. It's part of the torque-client
RPM, but shouldn't it be part of the torque-mom RPM in stead ?
More information about the torqueusers