[torqueusers] connection to Premature end of message dropped?
jacksond at clusterresources.com
Tue Jan 31 15:56:44 MST 2006
My first guess would be that the pbs_mom daemons do not trust the
pbs_server due to either a multi-homed host issue or a conflict in the
mom_priv/config configuration. The first step to diagnose this may be
running 'momctl -d 3' on the node1-mpi. This should indicate who the
mom expects the server to be and will provide warnings if this
communication is not occurring. Also, increasing the mom loglevel and
looking at the mom logs may indicate which ip address the server to mom
communication is using and will bark if it is bad.
Best of luck and let us know how this goes.
On Tue, 2006-01-31 at 17:43 -0500, Tom Combs wrote:
> I'm trying to get torque-2.0.0p7 running under Suse 10 on AMP
> Opterons. I start up the server on the master and the moms on the
> compute nodes but pbsnodes says that all the nodes are down. I see the
> following message for each node in by server_logs file:
> 01/31/2006 17:31:03;0001;PBS_Server;Svr;PBS_Server;stream_eof,
> connection to Premature end of message dropped (node1-mpi). setting
> node state to down
> I do have network access between all the nodes.
> Any pointers on where to look for the problem?
> Thanks, Tom Combs
More information about the torqueusers