[torqueusers] (no subject)
Mike Poublon
michael.poublon at hope.edu
Wed Dec 7 10:46:17 MST 2005
X-EXP32-SerialNo: 00101456, 00101457, 00101458, 00101459, 00101460
Subject: HUGE log file created
Message-ID: <43A9F5D5 at hope.edu>
Mime-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
X-Mailer: InterChange (Hydra) SMTP v3.62
If a node isn't listed in the server_priv/nodes file the server won't accept
the node, leading to excessively large log files (2 gigs) and pbs_server
crashing. I can duplicate the problem reliably. The large logs are created by
mom on the node trying to check in with the server many times per second
(1300+ on the machine I ran into this on).
Shouldn't there be a delay between connection attemps? I looked at the code in
the src/resmom directory but am not familiar with how things work.
I know there is an easy solution to this (list all the nodes in the nodes
file), but shouldn't mom be a little more robust?
Thanks for any input on this
More information about the torqueusers
mailing list