SOLVED: Re: [Mauiusers] maui start problem

Stijn De Weirdt stijn.deweirdt at ugent.be
Sun Jul 13 12:58:28 MDT 2008


hi all,

after some more digging through logfiles, i found an 'cannot locate  
timestamp' which lead to the following posting on how to fix it:
http://www.supercluster.org/pipermail/mauiusers/2007-May/002713.html


(for centos5, the FORTIFY_SOURCE option can be found in  
/usr/lib/rpm/redhat/macros)

hope this can help others.

stijn


> hi all,
>
> i had some time to look a bit further into it.
>
> the good news is that the scheduling works (and that i know that i can
> ignore the  'Resource temporarily unavailable' messages).
>
> the bad news is that the showq (or any other maui command still fails).
>
> strace of showq gives
> ...
> connect(3, {sa_family=AF_INET, sin_port=htons(40559),
> sin_addr=inet_addr("192.16
> 8.10.1")}, 16) = -1 EINPROGRESS (Operation now in progress)
> ...
> sendto(3, "00000057\nCK=4fa43eb400e5e9d7  DT=CMD=showq AUTH=root ARG=0
> ALL 0 \n"
> , 66, 0, NULL, 0) = 66
> select(4, [3], NULL, NULL, {30, 0})     = 1 (in [3], left {29, 893000})
> recvfrom(3, "", 9, 0, NULL, NULL)       = 0
> write(2, "ERROR:    lost connection to server\n", 36ERROR:    lost
> connection to
>  server
> ) = 36
>
> strace of maui during that try gives:
> ...
> select(10, [9], NULL, NULL, {5, 0})     = 1 (in [9], left {5, 0})
> recvfrom(9, "00000057\n", 9, 0, NULL, NULL) = 9
> select(10, [9], NULL, NULL, {5, 0})     = 1 (in [9], left {5, 0})
> recvfrom(9, "CK=4fa43eb400e5e9d7  DT=CMD=showq AUTH=root ARG=0 ALL 0
> \n", 57, 0\
> , NULL, NULL) = 57
> close(9)                                = 0
> ...
>
>
> thanks,
>
>
> stijn
>
>> symptom:
>> submitted jobs stay queued, showq/checkjob commands fail.
> the symptoms are not correlated.
>
> the fact that the scheduling didn't work seems due to the following
> line in my maui.cfg (that i copied from a working setup that was using
> a previous snapshot):
>
> SYSCFG[base] PLIST=
>
> setting LOGLEVEL to 9 and carefully reading the important messages gave
> some hints that all teh connetcions to torque were working finem, but
> that the jobs were held by something else.
>
>
>>
>> (using LOGLEVEL 9):
>> in /var/log/maui.log:
>>
>> 07/10 16:35:05 INFO:     no PBS sched socket connections ready
>> 07/10 16:35:05 MSUAcceptClient(5,ClientSD,HostName,TCP)
>> 07/10 16:35:05 INFO:     accept call failed, errno: 11 (Resource
>> temporarily unavailable)
>> 07/10 16:35:05 INFO:     all clients connected.  servicing requests
>
> reading log files more carefully, fd 5 is the listen on port 40559, and
> the fact that nothing connects to it gives this message. (eg telnet
> localhost 40559 shows something)
>
>
>
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers




More information about the mauiusers mailing list