[torqueusers] Adding a new node requires restart of pbs_server (bug)
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Fri Sep 30 04:41:33 MDT 2005
I'm seeing parallel jobs refusing to start correctly when the MOM
superior of the job runs on a node which has just been added to
the cluster. Another node's MOM in the sisterhood logs this:
09/30/2005 11:35:32;0001; pbs_mom;Svr;pbs_mom;im_request, bad connect from
10.1.129.7:1023 - unauthorized (okclients:
10.1.129.139,10.1.129.138,10.1.129.137,10.1.129.136,10.1.129.135,
10.1.129.134,10.1.129.133,10.1.129.132,10.1.129.131,
10.1.129.130,10.1.129.129,10.1.129.128,10.1.129.127,10.1.129.126,10.1.129.125,
10.1.129.124,10.1.129.123,10.1.129.122,10.1.129.121,10.1.129.120,10.1.129.119,
10.1.129.118,10.1.129.117,10.1.129.116,10.1.129.115,10.1.129.114,10.1.129.113,
10.1.129.112,10.1.129.111,10.1.129.110,10.1.129.109,10.1.129.108,10.1.129.107,
10.1.129.106,10.1.129.105,10.1.129.104,10.1.129.103,10.1.129.102,10.1.129.101,
10.1.129.100,10.1.129.159,10.1.129.219,10.1.130.19,10.1.130.202,10.1.128.2,
10.1.130.218,127.0.0.1)
The node 10.1.129.7 was just added to the cluster, and is being
refused by other MOMs. I googled for this error message and found
a workaround here:
http://www.supercluster.org/pipermail/torqueusers/2004-September/000746.html
You have to shut down pbs_server (and Maui) and restart it.
This solves the problem.
So this is a real bug in Torque, and not due to an unclean shutdown
of a node's pbs_mom. If you add a new node to the cluster, it seems
that you need to restart pbs-server. Not very elegant :-(
Actually, I just now found this bug in the Torque Bugzilla at
http://www.clusterresources.com/bugzilla/show_bug.cgi?id=91
so I can add that it's reproduced at other sites as well.
Restarting the pbs_mom on nodes is of course an unacceptable
workaround in a production environment, but the pbs_server
restart seems to do the trick.
I'm running Torque 1.2.0p6 (as distributed) on Centos 4.1 Linux
(a RHEL 4.0 clone).
--
Ole Holm Nielsen
Department of Physics, Technical University of Denmark
More information about the torqueusers
mailing list