[Mauiusers] MAUI not responding - "lost connection to server"

Adrian Sevcenco Adrian.Sevcenco at cern.ch
Thu Mar 12 16:30:31 MDT 2009


Ben Shepler wrote:
> Dear Adrian,
Hi,

> I found your description of this problem in the archives.  We have just
> experienced this same problem, and were wondering if you found a solution?
I can't say that i found one.. i just reinstalled the packages of maui
and torque (being careful to remove all configs leftovers after rpm
removal). The packages in question was those of gLite grid middleware.
After reconfiguration everything went smoothly .. i must note that we
had twice the same situation with the same solution : reinstall and
reconfigure.
All i can think is that the actual problem was about torque that was
hang somehow and maui could connect to for job interrogation.

Sorry for not being able to help you more,
Best regards,
Adrian

-------------------------------------------------------
Adrian Sevcenco - Institute of Space Sciences, Romania
-------------------------------------------------------



> best regards,
> Ben Shepler
> 
> Gianfranco Sciacca wrote:
>> Adrian Sevcenco wrote:
>>> Greenseid, Joseph M. wrote:
>>>  
>>>> it says ok for when it is starting up.  does it not actually start?  is
>>>> there a maui process running after you do this?
>>>>     
>>> yes, it has a process but when i try to do any command related to maui i
>>>  have :
>>> [r... at grid01 log]# checkjob 2
>>> ERROR:    lost connection to server
>>> ERROR:    cannot request service (status)
>>> I attached the log(9) of starting maui.
>>> Can somebody see the problem there?
>>> Thank you,
>>> Adrian
>>>   
>> Adrian, are you running nscd per chance? We have noticed on many of our
>> clients and servers that the nscd process tends to go haywire from time
>> to time and cause all sort of problems, including the one you mention.
>> The tell-tale would be nscd using 100% CPU on your grid01 machine.
>> Perhaps not your case, but worth checking.
> Hi and thanks for the tip but we don't have nscd on this machine.
> Adrian
> 
> 
>> cheers,
>> Gianfranco
>>>  
>>>>  
>>>> --Joe
>>>>
>>>> ------------------------------------------------------------------------
>>>> *From:* mauiusers-boun... at supercluster.org on behalf of Adrian Sevcenco
>>>> *Sent:* Mon 12/15/2008 12:56 PM
>>>> *To:* mauiusers at supercluster.org <mailto:mauiusers at supercluster.org>
>>>> *Subject:* [Mauiusers] MAUI not responding - "lost connection to server"
>>>>
>>>> Hi,
>>>> I have a strange situation :
>>>> when i try to restart the maui server i have :
>>>> [r... at grid01 /]# service maui restart
>>>> Shutting down MAUI Scheduler: ERROR:    lost connection to server
>>>> ERROR:    cannot request service (status)
>>>>                                                            [FAILED]
>>>> Starting MAUI Scheduler:                                   [  OK  ]
>>>>
>>>> The same with firewall down.
>>>> as configuration i have this :
>>>>
>>>> [r... at grid01 maui]# cat maui.cfg
>>>> # MAUI configuration example
>>>>
>>>> SERVERHOST              grid01.spacescience.ro
>>>> ADMIN1                  root
>>>> ADMIN3                  edginfo rgma edguser
>>>> ADMINHOSTS              grid01.spacescience.ro
>>>> RMCFG[base]             TYPE=PBS
>>>> SERVERPORT              40559
>>>> SERVERMODE              NORMAL
>>>>
>>>> # Set PBS server polling interval. If you have short # queues or/and
>>>> jobs it is worth to set a short interval. (10 seconds)
>>>>
>>>> RMPOLLINTERVAL        00:00:10
>>>>
>>>> # a max. 10 MByte log file in a logical location
>>>>
>>>> LOGFILE               /var/log/maui.log
>>>> LOGFILEMAXSIZE        10000000
>>>> LOGLEVEL              1
>>>>
>>>> # Set the delay to 1 minute before Maui tries to run a job again, # in
>>>> case it failed to run the first time.
>>>> # The default value is 1 hour.
>>>>
>>>> DEFERTIME       00:01:00
>>>>
>>>> # Necessary for MPI grid jobs
>>>> ENABLEMULTIREQJOBS TRUE
>>>>
>>>> Any ideas why it is not working? how can i debug this further?
>>>> is there a requirement of something to be in /etc/hosts ?
>>>> Thank you,
>>>> Adrian
>>>>
>>>>     
>>>
>>>
>>>  
>>> ------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> mauiusers mailing list
>>> mauiusers at supercluster.org <mailto:mauiusers at supercluster.org>
>>> _http://www.supercluster.org/mailman/listinfo/mauiusers_
>>>
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 3105 bytes
Desc: S/MIME Cryptographic Signature
Url : http://www.supercluster.org/pipermail/mauiusers/attachments/20090313/eb718028/smime.bin


More information about the mauiusers mailing list