[torquedev] problem with RM interface when RPP disabled
Lennart Karlsson
Lennart.Karlsson at nsc.liu.se
Fri Dec 9 00:30:09 MST 2005
Garrick,
You wrote:
> It seems that RM client programs (basicly, just momctl), aren't cleaning
> up their local priv ports when TORQUE is built with --disable-rpp.
>
> Try to use momctl a 1000 times and it will start failing when
> you run out of priv ports:
>
> $ for a in `seq 1 1000`;do momctl -d 0 -h hpcjr0004 >/dev/null;done
> cannot connect to MOM on node 'hpcjr0004', errno=99 (Cannot assign requested address)
> cannot connect to MOM on node 'hpcjr0004', errno=99 (Cannot assign requested address)
> cannot connect to MOM on node 'hpcjr0004', errno=99 (Cannot assign requested address)
>
> This works fine when using RPP. Can anyone else duplicate this?
I am not using rpp and with version 1.2.0p6-snap.1125811484 I get:
# for a in `seq 1 1000`;do /usr/pbs/sbin/momctl -d 0 -h n2 >/dev/null;done
ERROR: query[0] 'diag0' failed on n2 (errno: 98:98)
ERROR: query[0] 'diag0' failed on n2 (errno: 98:98)
ERROR: query[0] 'diag0' failed on n2 (errno: 98:98)
ERROR: query[0] 'diag0' failed on n2 (errno: 98:98)
ERROR: query[0] 'diag0' failed on n2 (errno: 98:98)
ERROR: query[0] 'diag0' failed on n2 (errno: 98:98)
ERROR: query[0] 'diag0' failed on n2 (errno: 98:98)
ERROR: query[0] 'diag0' failed on n2 (errno: 98:98)
ERROR: query[0] 'diag0' failed on n2 (errno: 98:98)
ERROR: query[0] 'diag0' failed on n2 (errno: 98:98)
[root at moonwatch maui]#
This can be repeated if I wait a few minutes, with few (10) ERROR answers
like this. But if I run a second such for loop at once after the first one,
I get exactly 1000 ERROR answers.
-- Lennart Karlsson <Lennart.Karlsson at nsc.liu.se>
National Supercomputer Centre in Linkoping, Sweden
http://www.nsc.liu.se
More information about the torquedev
mailing list