[torqueusers] maui & old gcc optimizer bug. (fwd)

Chris Johnson johnson at nmr.mgh.harvard.edu
Wed Nov 16 11:47:07 MST 2005


      Hi all,

      Have a little annoyance here which is driving me up the wall.  I
have two mini maui cluster with torque running.  The well behaved one
is on CentOS 4.1 with opteron architecture.  The less than ideal one
is on FC2 with P-III hardware.

      The one on the opterons is running terrific.

      The one on the P-III's gives me maui.log errors like this

(rc: 15041  errmsg: 'Execution server rejected request MSG=send failed, 
STARTING'  hostlist: 'node15')

and doesn't run jobs very often putting them in defered state yada
yada.  It isn't pretty.  In fact it's quite horrifically ugly.

      After some googling, I came across an old reference indicating a
similar problem being caused by a gcc compiler optimizer bug gen'ing
up bad code.  Ok, so I tried recompiling maui with -O0 although the
man page says this is the default.

      No joy, same bad behavior.

      Do I need to recompile torque as well?  Does -O0 work?  What the
f*&k is going on?  Excuse me, I've been fiting this one for a while
now.  Help GREATLY appreciated.  I need to replace the C scheduler
with something.  I'd like to use maui.  But I'd like to be able to get
it to work right twice before I commit the whole cluster to it.

      One other thing, probably related, maui keeps crashing and the 
last line in the log is

ERROR:    cannot get node info: Unknown Job Id


      Thank you.

-------------------------------------------------------------------------------
Chris Johnson               |Internet: johnson at nmr.mgh.harvard.edu
Systems Administrator       |Web:      http://www.nmr.mgh.harvard.edu/~johnson
NMR Center                  |Voice:    617.726.0949
Mass. General Hospital      |FAX:      617.726.7422
149 (2301) 13th Street      |"Quantum mechanics demands that magic exists"
Charlestown, MA., 02129 USA |        Me
-------------------------------------------------------------------------------




More information about the torqueusers mailing list