[Mauiusers] Most "stable" version of Maui
Bas van der Vlies
basv at sara.nl
Fri Jan 11 10:26:56 MST 2008
Michael Barnes wrote:
> On Fri, Jan 11, 2008 at 04:57:16PM +0100, Bas van der Vlies wrote:
>> Michael Barnes wrote:
>>> Maui users,
>>>
>> Michael,
>>
>> Try the lastest snapshot of maui (maui-3.2.6p20-snap.1182974819). If a
>> remember it correct there is a bug in maui-3.2.6p19 a patch was not applied
>> correctly and therefore you get a segv.
>>
>> I am also running the lastest snapshot without any problems.
>
> Maybe this is a Fedora Core 7 thing. I just compiled and installed this
> snapshot. This is how I ran configure:
>
> # these are the same flags that all of the FC7 RPMs use
>
> export CFLAGS="-D__M64 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic"
>
> cd maui-3.2.6p20/
>
> ./configure --prefix=/usr/local
>
>
> And it ran 2 jobs, and now its acting funny.
>
> Sometimes jobs will run, sometimes not.
>
> I also get this:
>
> checkjob 169101.pbsold
> ERROR: lost connection to server
> ERROR: cannot request service (status)
>
>
> Same with:
>
> showq
> ERROR: lost connection to server
> ERROR: cannot request service (status)
>
>
> I do an strace on the running maui process and I see:
>
> select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout)
> select(1024, [8], NULL, NULL, {0, 10000}) = 0 (Timeout)
> accept(5, 0x7fff83ee6510, [9506649594159693840]) = -1 EAGAIN (Resource temporarily unavailable)
> select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout)
> select(1024, [8], NULL, NULL, {0, 10000}) = 0 (Timeout)
> accept(5, 0x7fff83ee6510, [9506649594159693840]) = -1 EAGAIN (Resource temporarily unavailable)
> select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout)
> select(1024, [8], NULL, NULL, {0, 10000}) = 0 (Timeout)
> accept(5, 0x7fff83ee6510, [9506649594159693840]) = -1 EAGAIN (Resource temporarily unavailable)
> select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout)
> select(1024, [8], NULL, NULL, {0, 10000}) = 0 (Timeout)
> accept(5, 0x7fff83ee6510, [9506649594159693840]) = -1 EAGAIN (Resource temporarily unavailable)
> select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout)
> select(1024, [8], NULL, NULL, {0, 10000}) = 0 (Timeout)
> accept(5, 0x7fff83ee6510, [9506649594159693840]) = -1 EAGAIN (Resource temporarily unavailable)
>
> over and over again.
>
> An strace on the client command says this many times (as root and me):
>
> bind(6, {sa_family=AF_INET, sin_port=htons(831), sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EACCES (Permission denied)
>
>
> I see nothing similar to the working version (meaning there is no bind()
> call).
>
>
>
> I don't know what else to try besides reinstalling the OS in 32bit mode,
> which is not a big deal. But if anybody has any suggestions, I'm open
> to them.
>
> Another piece of information is that I am running the pbs_server in
> debug mode, but AFAIK, this only keeps it from forking and it dumps out
> some stuff on the terminal.
>
>
>
> I don't know what more to try.
>
>
>
> -mb
>
>
>
> --
> +-----------------------------------------------
> | Michael Barnes
> |
> | Thomas Jefferson National Accelerator Facility
> | 12000 Jefferson Ave.
> | Newport News, VA 23606
> | (757) 269-7634
> +-----------------------------------------------
Did you try to set the loglevel to 9 and check the maui.log for error
messages. All tools also the client tools (diagnose, checkjob, ...) are
communicating with the server. So if the server (maui) crashes nothing
works anymore.
It seems like maui closes the socket, maybe ipv6 related or something. Just
a guess
Regards
--
--
********************************************************************
* *
* Bas van der Vlies e-mail: basv at sara.nl *
* SARA - Academic Computing Services phone: +31 20 592 8012 *
* Kruislaan 415 fax: +31 20 6683167 *
* 1098 SJ Amsterdam *
* *
********************************************************************
More information about the mauiusers
mailing list