[Mauiusers] MAUI not responding - "lost connection to server"

Adrian Sevcenco Adrian.Sevcenco at cern.ch
Tue Dec 16 07:09:07 MST 2008


Gianfranco Sciacca wrote:
> Adrian Sevcenco wrote:
>> Greenseid, Joseph M. wrote:
>>  
>>> it says ok for when it is starting up.  does it not actually start?  is
>>> there a maui process running after you do this?
>>>     
>> yes, it has a process but when i try to do any command related to maui i
>>  have :
>> [root at grid01 log]# checkjob 2
>> ERROR:    lost connection to server
>> ERROR:    cannot request service (status)
>> I attached the log(9) of starting maui.
>> Can somebody see the problem there?
>> Thank you,
>> Adrian
>>   
> Adrian, are you running nscd per chance? We have noticed on many of our
> clients and servers that the nscd process tends to go haywire from time
> to time and cause all sort of problems, including the one you mention.
> The tell-tale would be nscd using 100% CPU on your grid01 machine.
> Perhaps not your case, but worth checking.
Hi and thanks for the tip but we don't have nscd on this machine.
Adrian


> cheers,
> Gianfranco
>>  
>>>  
>>> --Joe
>>>
>>> ------------------------------------------------------------------------
>>> *From:* mauiusers-bounces at supercluster.org on behalf of Adrian Sevcenco
>>> *Sent:* Mon 12/15/2008 12:56 PM
>>> *To:* mauiusers at supercluster.org
>>> *Subject:* [Mauiusers] MAUI not responding - "lost connection to server"
>>>
>>> Hi,
>>> I have a strange situation :
>>> when i try to restart the maui server i have :
>>> [root at grid01 /]# service maui restart
>>> Shutting down MAUI Scheduler: ERROR:    lost connection to server
>>> ERROR:    cannot request service (status)
>>>                                                            [FAILED]
>>> Starting MAUI Scheduler:                                   [  OK  ]
>>>
>>> The same with firewall down.
>>> as configuration i have this :
>>>
>>> [root at grid01 maui]# cat maui.cfg
>>> # MAUI configuration example
>>>
>>> SERVERHOST              grid01.spacescience.ro
>>> ADMIN1                  root
>>> ADMIN3                  edginfo rgma edguser
>>> ADMINHOSTS              grid01.spacescience.ro
>>> RMCFG[base]             TYPE=PBS
>>> SERVERPORT              40559
>>> SERVERMODE              NORMAL
>>>
>>> # Set PBS server polling interval. If you have short # queues or/and
>>> jobs it is worth to set a short interval. (10 seconds)
>>>
>>> RMPOLLINTERVAL        00:00:10
>>>
>>> # a max. 10 MByte log file in a logical location
>>>
>>> LOGFILE               /var/log/maui.log
>>> LOGFILEMAXSIZE        10000000
>>> LOGLEVEL              1
>>>
>>> # Set the delay to 1 minute before Maui tries to run a job again, # in
>>> case it failed to run the first time.
>>> # The default value is 1 hour.
>>>
>>> DEFERTIME       00:01:00
>>>
>>> # Necessary for MPI grid jobs
>>> ENABLEMULTIREQJOBS TRUE
>>>
>>> Any ideas why it is not working? how can i debug this further?
>>> is there a requirement of something to be in /etc/hosts ?
>>> Thank you,
>>> Adrian
>>>
>>>     
>>
>>
>>  
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> mauiusers mailing list
>> mauiusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/mauiusers
>>   
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3105 bytes
Desc: S/MIME Cryptographic Signature
Url : http://www.supercluster.org/pipermail/mauiusers/attachments/20081216/163bba0e/smime.bin


More information about the mauiusers mailing list