[torqueusers] Strange problem in Torque

Peter Wyckoff wyckoff at yahoo-inc.com
Sat Aug 4 17:26:49 MDT 2007


Hi Garrick,

Version 2.1.8 - does this version have the problem?

Thanks, pete


On 8/4/07 8:52 AM, "Garrick Staples" <garrick at usc.edu> wrote:

> On Fri, Aug 03, 2007 at 06:24:53PM -0700, Peter Wyckoff alleged:
>> 
>> Last week we were seeing about a third of our cluster's pbs_moms not
>> trusting the rest of the cluster - i.e., always saying unauthorized request
>> when a sister mom tried to involve them in a job. See included snippet from
>> the logs.
>> 
>> We restarted the pbs_server and all the pbs_moms and the problem was
>> resolved. 
>> 
>> The only thing I can think of is that earlier in the day, we had too many
>> queues for the default maui (had 18 and the #define says the max is 12) and
>> maui was having problems. We deleted the queues and restarted maui which
>> fixed the maui problem. But, could maui have caused torque to get in a bad
>> state?
>> 
>> Any help would be appreciated.
>> 
>> Thanks, pete
>> 
>> 
>> Jul 24 20:12:17 kry1776 pbs_mom: im_request, bad connect from
>> 72.30.63.43:1023 - unauthorized (okclients:
>> 72.30.62.70,72.30.62.72,72.30.62.71,72.30.62.73,72.30.62.75,72.30.62.77,72.3
>> 0.62.79,72.30.62.78,72.30.62.81,72.
> 
> Just restarting pbs_server is probably enough for this error, and no, maui
> probably doesn't have anything to do with it.
> 
> What version of torque?  Earlier versions had some bugs with the okclients
> list.
> 



More information about the torqueusers mailing list