[Mauiusers] Re: maui & old gcc optimizer bug.

Chris Johnson johnson at nmr.mgh.harvard.edu
Wed Nov 16 08:10:02 MST 2005

On Wed, 16 Nov 2005, Chris Johnson wrote:

>     Hi all,
>     Have a little annoyance here which is driving me up the wall.  I
> have two mini maui cluster with torque running.  The well behaved one
> is on CentOS 4.1 with opteron architecture.  The less than ideal one
> is on FC2 with P-III hardware.
>     The one on the opterons is running terrific.
>     The one on the P-III's gives me maui.log errors like this
> (rc: 15041  errmsg: 'Execution server rejected request MSG=send failed, 
> STARTING'  hostlist: 'node15')
> and doesn't run jobs very often putting them in defered state yada
> yada.  It isn't pretty.  In fact it's quite horrifically ugly.
>     After some googling, I came across an old reference indicating a
> similar problem being caused by a gcc compiler optimizer bug gen'ing
> up bad code.  Ok, so I tried recompiling maui with -O0 although the
> man page says this is the default.
>     No joy, same bad behavior.
>     Do I need to recompile torque as well?  Does -O0 work?  What the
> f*&k is going on?  Excuse me, I've been fiting this one for a while
> now.  Help GREATLY appreciated.  I need to replace the C scheduler
> with something.  I'd like to use maui.  But I'd like to be able to get
> it to work right twice before I commit the whole cluster to it.
>     Thank you.

      One other thing, probably related, maui keeps crashing and the
last line in the log is

ERROR:    cannot get node info: Unknown Job Id

Chris Johnson               |Internet: johnson at nmr.mgh.harvard.edu
Systems Administrator       |Web:      http://www.nmr.mgh.harvard.edu/~johnson
NMR Center                  |Voice:    617.726.0949
Mass. General Hospital      |FAX:      617.726.7422
149 (2301) 13th Street      |Life stinks.  If you're very lucky, sometimes
Charlestown, MA., 02129 USA |it stinks a little less.  Me

More information about the mauiusers mailing list