[torqueusers] Torque-1.2.0p5 Server to mom communication error

Clifton Kirby ckirby3 at colsa.com
Thu Sep 1 12:36:32 MDT 2005


I never got any response to this post so I thought I would post it again.
Does anyone else use the --disable-rpp option on larger clusters?  I didn't
see this problem until I added this option but it was recommended for larger
clusters and  ours is over 3000 processors.  Thanks in advance..

----------------------------------------------------------------------------
-------------------------------------


Running on Mac OS x 10.4.2 using Myrinet.

I used gcc 4.0 to compile  torque-1.2.0p5 and the configure line I used is
as follows,

/configure --prefix=/opt/torque --with-scp --enable-server --set-sched=c --
enable-docs --enable-mom --enable-clients --enable-syslog --set-server-home=
/private/var/spool/torque --set-default-server=mach5c.mach5.roc --disable-fi
lesync --disable-gui --disable-rpp

The following messages are being logged in the mom_logs,

----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------
08/19/2005 13:40:25;0001;   pbs_mom;Svr;pbs_mom;Unknown error: 0 (0) in
rm_request, bad attempt to connect - unauthorized (port: 59791)
        message refused from port 59791 addr 172.16.21.254
08/19/2005 13:44:25;0001;   pbs_mom;Svr;pbs_mom;Unknown error: 0 (0) in
rm_request, bad attempt to connect - unauthorized (port: 62920)
        message refused from port 62920 addr 172.16.21.254
08/19/2005 13:45:25;0001;   pbs_mom;Svr;pbs_mom;Unknown error: 0 (0) in
rm_request, bad attempt to connect - unauthorized (port: 63449)
        message refused from port 63449 addr 172.16.21.254
08/19/2005 13:46:25;0001;   pbs_mom;Svr;pbs_mom;Unknown error: 0 (0) in
rm_request, bad attempt to connect - unauthorized (port: 63978)
        message refused from port 63978 addr 172.16.21.254
08/19/2005 13:47:25;0001;   pbs_mom;Svr;pbs_mom;Unknown error: 0 (0) in
rm_request, bad attempt to connect - unauthorized (port: 64507)
        message refused from port 64507 addr 172.16.21.254
----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------

Seems like mom to server communication is being attempted on a range of
ports outside the standard 15001-15004.  Should I reserve a range of ports
in /etc/services?

- Cliff



More information about the torqueusers mailing list