[torqueusers] Problems with jobs hanging: bad connect from 172.16.0.27:1023

Peter Mardahl peterm at CSUA.Berkeley.EDU
Wed Dec 15 14:18:58 MST 2004


 - unauthorized

Hello,

  I'm using torque-1.1.0p2, and I'm having some problems.
Jobs fairly often refuse to run from the queue apparently because of this:

12/14/2004 18:31:10;0001;   pbs_mom;Svr;pbs_mom;im_request, bad connect from 172.16.0.27:1023
 - unauthorized (okclients: 172.16.0.3,172.16.0.2,172.16.0.1,172.16.100.1,172.16.0.20,127.0.0
.1)


Now, I looked in the mailing list history, and someone said the workaround
for this issue was to make your pbs_mom config file have every machine
listed as a $clienthost, like this:

$logevent 0x1ff
$clienthost head
$clienthost node001
$clienthost node002
$clienthost node003
$clienthost node004
$clienthost node005
$clienthost node006
$clienthost node007
$clienthost node008
$clienthost node009
$clienthost node010
$clienthost node011
$clienthost node012
$clienthost node013
$clienthost node014
$clienthost node015
$clienthost node016
$clienthost node017
$clienthost node018
$clienthost node019
$clienthost node020
$clienthost node021
$clienthost node022
$clienthost node023
$clienthost node024
$clienthost node025
$clienthost node026
$clienthost node027
$clienthost node028
$clienthost node029
$clienthost node030
$clienthost node031


*****************************
All of "node*" are listed in /etc/hosts properly on every node.

However, as you see from the error message above, this doesn't seem to
"take" in that only the first few seem to be "ok".

Is there some other configuration problem?  Or should I be using a
later patchlevel of torque?

Thanks for any help,

Peter Mardahl





More information about the torqueusers mailing list