[Mauiusers] MAUI not responding - "lost connection to server"

Gianfranco Sciacca gs at hep.ucl.ac.uk
Tue Dec 16 06:30:25 MST 2008


Adrian Sevcenco wrote:
> Greenseid, Joseph M. wrote:
>   
>> it says ok for when it is starting up.  does it not actually start?  is
>> there a maui process running after you do this?
>>     
> yes, it has a process but when i try to do any command related to maui i
>  have :
> [root at grid01 log]# checkjob 2
> ERROR:    lost connection to server
> ERROR:    cannot request service (status)
> I attached the log(9) of starting maui.
> Can somebody see the problem there?
> Thank you,
> Adrian
>   
Adrian, are you running nscd per chance? We have noticed on many of our 
clients and servers that the nscd process tends to go haywire from time 
to time and cause all sort of problems, including the one you mention. 
The tell-tale would be nscd using 100% CPU on your grid01 machine. 
Perhaps not your case, but worth checking.

cheers,
Gianfranco
>   
>>  
>> --Joe
>>
>> ------------------------------------------------------------------------
>> *From:* mauiusers-bounces at supercluster.org on behalf of Adrian Sevcenco
>> *Sent:* Mon 12/15/2008 12:56 PM
>> *To:* mauiusers at supercluster.org
>> *Subject:* [Mauiusers] MAUI not responding - "lost connection to server"
>>
>> Hi,
>> I have a strange situation :
>> when i try to restart the maui server i have :
>> [root at grid01 /]# service maui restart
>> Shutting down MAUI Scheduler: ERROR:    lost connection to server
>> ERROR:    cannot request service (status)
>>                                                            [FAILED]
>> Starting MAUI Scheduler:                                   [  OK  ]
>>
>> The same with firewall down.
>> as configuration i have this :
>>
>> [root at grid01 maui]# cat maui.cfg
>> # MAUI configuration example
>>
>> SERVERHOST              grid01.spacescience.ro
>> ADMIN1                  root
>> ADMIN3                  edginfo rgma edguser
>> ADMINHOSTS              grid01.spacescience.ro
>> RMCFG[base]             TYPE=PBS
>> SERVERPORT              40559
>> SERVERMODE              NORMAL
>>
>> # Set PBS server polling interval. If you have short # queues or/and
>> jobs it is worth to set a short interval. (10 seconds)
>>
>> RMPOLLINTERVAL        00:00:10
>>
>> # a max. 10 MByte log file in a logical location
>>
>> LOGFILE               /var/log/maui.log
>> LOGFILEMAXSIZE        10000000
>> LOGLEVEL              1
>>
>> # Set the delay to 1 minute before Maui tries to run a job again, # in
>> case it failed to run the first time.
>> # The default value is 1 hour.
>>
>> DEFERTIME       00:01:00
>>
>> # Necessary for MPI grid jobs
>> ENABLEMULTIREQJOBS TRUE
>>
>> Any ideas why it is not working? how can i debug this further?
>> is there a requirement of something to be in /etc/hosts ?
>> Thank you,
>> Adrian
>>
>>     
>
>
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers
>   


-- 
Dr. Gianfranco Sciacca			Tel: +44 (0)20 7679 3044
Dept of Physics and Astronomy		Internal: 33044
University College London		D15 - Physics Building
London WC1E 6BT



More information about the mauiusers mailing list