[Mauiusers] Most "stable" version of Maui
Michael Barnes
Michael.Barnes at jlab.org
Fri Jan 11 09:38:44 MST 2008
On Fri, Jan 11, 2008 at 04:57:16PM +0100, Bas van der Vlies wrote:
> Michael Barnes wrote:
> >Maui users,
> >
> Michael,
>
> Try the lastest snapshot of maui (maui-3.2.6p20-snap.1182974819). If a
> remember it correct there is a bug in maui-3.2.6p19 a patch was not applied
> correctly and therefore you get a segv.
>
> I am also running the lastest snapshot without any problems.
Maybe this is a Fedora Core 7 thing. I just compiled and installed this
snapshot. This is how I ran configure:
# these are the same flags that all of the FC7 RPMs use
export CFLAGS="-D__M64 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic"
cd maui-3.2.6p20/
./configure --prefix=/usr/local
And it ran 2 jobs, and now its acting funny.
Sometimes jobs will run, sometimes not.
I also get this:
checkjob 169101.pbsold
ERROR: lost connection to server
ERROR: cannot request service (status)
Same with:
showq
ERROR: lost connection to server
ERROR: cannot request service (status)
I do an strace on the running maui process and I see:
select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout)
select(1024, [8], NULL, NULL, {0, 10000}) = 0 (Timeout)
accept(5, 0x7fff83ee6510, [9506649594159693840]) = -1 EAGAIN (Resource temporarily unavailable)
select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout)
select(1024, [8], NULL, NULL, {0, 10000}) = 0 (Timeout)
accept(5, 0x7fff83ee6510, [9506649594159693840]) = -1 EAGAIN (Resource temporarily unavailable)
select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout)
select(1024, [8], NULL, NULL, {0, 10000}) = 0 (Timeout)
accept(5, 0x7fff83ee6510, [9506649594159693840]) = -1 EAGAIN (Resource temporarily unavailable)
select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout)
select(1024, [8], NULL, NULL, {0, 10000}) = 0 (Timeout)
accept(5, 0x7fff83ee6510, [9506649594159693840]) = -1 EAGAIN (Resource temporarily unavailable)
select(0, NULL, NULL, NULL, {0, 100000}) = 0 (Timeout)
select(1024, [8], NULL, NULL, {0, 10000}) = 0 (Timeout)
accept(5, 0x7fff83ee6510, [9506649594159693840]) = -1 EAGAIN (Resource temporarily unavailable)
over and over again.
An strace on the client command says this many times (as root and me):
bind(6, {sa_family=AF_INET, sin_port=htons(831), sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EACCES (Permission denied)
I see nothing similar to the working version (meaning there is no bind()
call).
I don't know what else to try besides reinstalling the OS in 32bit mode,
which is not a big deal. But if anybody has any suggestions, I'm open
to them.
Another piece of information is that I am running the pbs_server in
debug mode, but AFAIK, this only keeps it from forking and it dumps out
some stuff on the terminal.
I don't know what more to try.
-mb
--
+-----------------------------------------------
| Michael Barnes
|
| Thomas Jefferson National Accelerator Facility
| 12000 Jefferson Ave.
| Newport News, VA 23606
| (757) 269-7634
+-----------------------------------------------
More information about the mauiusers
mailing list