[torqueusers] Adding a new node requires restart of pbs_server (bug)

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Fri Sep 30 04:41:33 MDT 2005

I'm seeing parallel jobs refusing to start correctly when the MOM
superior of the job runs on a node which has just been added to
the cluster.  Another node's MOM in the sisterhood logs this:

09/30/2005 11:35:32;0001;   pbs_mom;Svr;pbs_mom;im_request, bad connect from - unauthorized (okclients:,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

The node was just added to the cluster, and is being
refused by other MOMs.  I googled for this error message and found
a workaround here:
You have to shut down pbs_server (and Maui) and restart it.
This solves the problem.

So this is a real bug in Torque, and not due to an unclean shutdown
of a node's pbs_mom.  If you add a new node to the cluster, it seems
that you need to restart pbs-server.  Not very elegant :-(

Actually, I just now found this bug in the Torque Bugzilla at
so I can add that it's reproduced at other sites as well.
Restarting the pbs_mom on nodes is of course an unacceptable
workaround in a production environment, but the pbs_server
restart seems to do the trick.

I'm running Torque 1.2.0p6 (as distributed) on Centos 4.1 Linux
(a RHEL 4.0 clone).

Ole Holm Nielsen
Department of Physics, Technical University of Denmark

More information about the torqueusers mailing list