[Mauiusers] maui & old gcc optimizer bug.
garrick at usc.edu
Sat Nov 19 14:20:09 MST 2005
On Wed, Nov 16, 2005 at 09:13:59AM -0500, Chris Johnson alleged:
> Hi all,
> Have a little annoyance here which is driving me up the wall. I
> have two mini maui cluster with torque running. The well behaved one
> is on CentOS 4.1 with opteron architecture. The less than ideal one
> is on FC2 with P-III hardware.
> The one on the opterons is running terrific.
> The one on the P-III's gives me maui.log errors like this
> (rc: 15041 errmsg: 'Execution server rejected request MSG=send failed,
> STARTING' hostlist: 'node15')
The "Execution server" in this case is node15. Off the top of my head,
I'd say the most likely cause is a long-running prologue.
As Chris mentioned, this probably has nothing to do with Maui. What
version of TORQUE?
In 1.2.0p1 we added the $jobstartblocktime MOM config parameter. It
specifies the number of seconds pbs_mom will block on the initial
attempt to start the job. After jobstartblocktime seconds, pbs_mom
returns "ask me again later" back to pbs_server. Unfortunately,
pbs_server is also blocked during that time. The default is 5 seconds.
I set mine to 0 for fully non-blocking job startup. IMHO, 0 is the
ideal value but I don't think it is widely tested outside of my cluster.
You can try higher or lower values on-the-fly with
'momctl -h node15 -q jobstartblocktime=X'.
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/mauiusers/attachments/20051119/98a8ce42/attachment.bin
More information about the mauiusers