[Mauiusers] MAUI not responding - "lost connection to server"
Gianfranco Sciacca
gs at hep.ucl.ac.uk
Tue Dec 16 06:30:25 MST 2008
Adrian Sevcenco wrote:
> Greenseid, Joseph M. wrote:
>
>> it says ok for when it is starting up. does it not actually start? is
>> there a maui process running after you do this?
>>
> yes, it has a process but when i try to do any command related to maui i
> have :
> [root at grid01 log]# checkjob 2
> ERROR: lost connection to server
> ERROR: cannot request service (status)
> I attached the log(9) of starting maui.
> Can somebody see the problem there?
> Thank you,
> Adrian
>
Adrian, are you running nscd per chance? We have noticed on many of our
clients and servers that the nscd process tends to go haywire from time
to time and cause all sort of problems, including the one you mention.
The tell-tale would be nscd using 100% CPU on your grid01 machine.
Perhaps not your case, but worth checking.
cheers,
Gianfranco
>
>>
>> --Joe
>>
>> ------------------------------------------------------------------------
>> *From:* mauiusers-bounces at supercluster.org on behalf of Adrian Sevcenco
>> *Sent:* Mon 12/15/2008 12:56 PM
>> *To:* mauiusers at supercluster.org
>> *Subject:* [Mauiusers] MAUI not responding - "lost connection to server"
>>
>> Hi,
>> I have a strange situation :
>> when i try to restart the maui server i have :
>> [root at grid01 /]# service maui restart
>> Shutting down MAUI Scheduler: ERROR: lost connection to server
>> ERROR: cannot request service (status)
>> [FAILED]
>> Starting MAUI Scheduler: [ OK ]
>>
>> The same with firewall down.
>> as configuration i have this :
>>
>> [root at grid01 maui]# cat maui.cfg
>> # MAUI configuration example
>>
>> SERVERHOST grid01.spacescience.ro
>> ADMIN1 root
>> ADMIN3 edginfo rgma edguser
>> ADMINHOSTS grid01.spacescience.ro
>> RMCFG[base] TYPE=PBS
>> SERVERPORT 40559
>> SERVERMODE NORMAL
>>
>> # Set PBS server polling interval. If you have short # queues or/and
>> jobs it is worth to set a short interval. (10 seconds)
>>
>> RMPOLLINTERVAL 00:00:10
>>
>> # a max. 10 MByte log file in a logical location
>>
>> LOGFILE /var/log/maui.log
>> LOGFILEMAXSIZE 10000000
>> LOGLEVEL 1
>>
>> # Set the delay to 1 minute before Maui tries to run a job again, # in
>> case it failed to run the first time.
>> # The default value is 1 hour.
>>
>> DEFERTIME 00:01:00
>>
>> # Necessary for MPI grid jobs
>> ENABLEMULTIREQJOBS TRUE
>>
>> Any ideas why it is not working? how can i debug this further?
>> is there a requirement of something to be in /etc/hosts ?
>> Thank you,
>> Adrian
>>
>>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers
>
--
Dr. Gianfranco Sciacca Tel: +44 (0)20 7679 3044
Dept of Physics and Astronomy Internal: 33044
University College London D15 - Physics Building
London WC1E 6BT
More information about the mauiusers
mailing list