[torqueusers] MOM communication problem

Thomas Vojta vojtat at umr.edu
Wed Aug 11 13:41:46 MDT 2004


Hi all,

I have encountered a problem with the communication
between pbs_server and pbs_mom (I guess). I has been
discussed here before, but none of the suggestions seems
to work.


- my nodes are never detected and in pbsnodes -a
they are all marked as "state-unknown,down"

- the logs of pbs_mom only shows lines like
08/11/2004 13:34:53;0001;   pbs_mom;Svr;pbs_mom;im_eof, End of File from 
addr 192.168.0.254:15001
08/11/2004 13:43:33;0001;   pbs_mom;Svr;pbs_mom;im_eof, Premature end of 
message from addr 192.168.0.254:15001
08/11/2004 14:16:03;0001;   pbs_mom;Svr;pbs_mom;im_eof, Premature end of 
message from addr 192.168.0.254:15001

or

08/11/2004 14:22:54;0001;   pbs_mom;Svr;pbs_mom;im_eof, End of File from 
addr 192.168.0.254:15001
08/11/2004 14:23:24;0001;   pbs_mom;Svr;pbs_mom;im_eof, End of File from 
addr 192.168.0.254:15001
08/11/2004 14:23:54;0001;   pbs_mom;Svr;pbs_mom;im_eof, End of File from 
addr 192.168.0.254:15001

I use TORQUE-1.1.0p0 on a 64-node cluster connected via private Gigabit 
network;
the server has 2 NICs. I tried the various suggestions made on this board
(moving the server entry to first position in the pbs_mom config file,
using only the host names of the internal network for pbs_server and in the
config files.)


Any other suggestions? Has there been an "official solution" to this issue?

Thanks a lot
Thomas






------------------------------------------------------------------
Thomas Vojta                                   phone: 573-341-4793
Assistant Professor                              fax: 573-341-4715
Department of Physics
University of Missouri-Rolla                 mailto:vojtat at umr.edu
1870 Miner Circle                       mailto:thomas at vojtanet.com
Rolla, MO 65409                         http://www.umr.edu/~vojtat



More information about the torqueusers mailing list