[torqueusers] Re: LAM-MPI won't boot with torque-1.2.0p6

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Thu Sep 15 15:07:54 MDT 2005


garrick wrote:
>>However, on 3 nodes it still fails:
> 
> With the case above working, and this one failing, this implies that
> pbs_mom can't talk to the pbs_demux process.  Do you have any kind of
> port filtering in place?

No, iptables of off.  Port filtering doesn't seem to make sense
on a private network.

# service iptables status
Firewall is stopped.

Speaking of a pbs_demux process, when would that be started ?
It's not running on the nodes after I start an interactive PBS job.

> Actually, if you configured torque with --enable-syslog you should have
> errors related to open_demux() in your syslog.

Right you are, I see some errors !  On the PBS job master node:

Sep 15 20:58:27 n469 pbs_mom: Connection refused (111) in open_demux, 
open_demux: connect 127.0.0.1:34976
Sep 15 21:29:43 n469 pbs_mom: Connection refused (111) in open_demux, 
open_demux: connect 127.0.0.1:34987
Sep 15 21:34:17 n469 pbs_mom: Connection refused (111) in open_demux, 
open_demux: connect 127.0.0.1:34999

On a slave node:

Sep 15 20:58:33 n478 pbs_mom: Connection refused (111) in open_demux, 
open_demux: connect 10.1.130.219:34976
Sep 15 21:29:49 n478 pbs_mom: Connection refused (111) in open_demux, 
open_demux: connect 10.1.130.219:34987
Sep 15 21:34:23 n478 pbs_mom: Connection refused (111) in open_demux, 
open_demux: connect 10.1.130.219:34999

Here 10.1.130.219 is the IP-address of the job master node, n469.

It seems to me we're getting closer, but what config parameter
would control access to open_demux() ?

FYI, I've installed these Torque RPMs (based somewhat on your
torque.spec file) on the nodes:

# rpm -qa | grep torque
torque-1.2.0p6-1.fys
torque-mom-1.2.0p6-1.fys
torque-client-1.2.0p6-1.fys

The nodes have the following files installed in /usr/sbin:

# ls -la /usr/sbin/pbs*
-rwxr-xr-x  1 root root  15950 Sep 13 13:46 /usr/sbin/pbs_demux
-rwsr-xr-x  1 root root  85882 Sep 13 13:46 /usr/sbin/pbs_iff
-rwx------  1 root root 697041 Sep 13 13:46 /usr/sbin/pbs_mom
-rwsr-xr-x  1 root root  36913 Sep 13 13:46 /usr/sbin/pbs_rcp

So pbs_demux is actually installed.  It's part of the torque-client
RPM, but shouldn't it be part of the torque-mom RPM in stead ?

Thanks,
Ole



More information about the torqueusers mailing list