[torqueusers] bad connect from x.x.x.x

Bill Wichser bill at Princeton.EDU
Wed Sep 22 10:56:44 MDT 2004


torque-1.1.0p0
maui-3.2.6
mpiexec-0.76

This has come up before with no solutions.

The message on the afflicted node is:

pbs_mom;Svr;pbs_mom;im_request, bad connect from 172
.16.0.33:1023 - unauthorized (okclients:172.16.100.1,172.16.0.37,127.0.0.1)

100.1 is the head node, listed in the clients as a $clienthost.
0.33 is the master node in the MPI code.
0.37 is this client node.

According to my understanding, the head node builds my client list from 
the server_priv/nodes file and ships this to the MOM on job start.  I am 
using mpiexec for this startup.

Sometimes a restart of the pbs_server as well as all the pbs_moms on the 
clients fixes this problem.  Other times it take multiple restarts to 
correct.

I have tried listing each and every node, by nodename, as a $clienthost 
in the mom_priv/config file to no avail.  Perhaps adding the IP address 
might help.  But there seems to be something wrong somewhere either in 
the server or the mom, I'm not really sure which.

The situation arises when a node is rebooted or the pbs_mom gets 
restarted but doesn't happen very often even in these cases.

Can anyone offer a suggestion?  Is torque-1.1.0p1 a possible solution? 
Are the mpiexec patches for input/output redirect really installed in 
that release?

Thanks,

Bill



More information about the torqueusers mailing list