[Mauiusers] Re: mauiusers Digest, Vol 12, Issue 12

Dave Jackson jacksond at clusterresources.com
Tue Jul 26 19:16:54 MDT 2005


Josh,

  Maui does not use the 'alarm()' routine internally, this only exists
inside of the TORQUE/PBS API.  Consequently, the SIGALRM signal you are
seeing most likely indicates that one or more of your compute nodes are
having issues.  A recommendation to address this would be to look at the
'job_stat_rate' and 'poll_jobs' parameters described in the online
TORQUE documentation at

http://www.clusterresources.com/products/torque/docs/3.4largesystems.shtml

  Please let us know if this addresses your issue.

Thanks,
Dave

On Wed, 2005-07-27 at 09:03 +0800, group hpc wrote:
> Hi,
> 
> The following is the output from gdb, pls help. Thanks.
> 
> (gdb) r
> Starting program: /usr/local/maui/sbin/maui
> Detaching after fork from child process 5014.
> Detaching after fork from child process 5479.
> Detaching after fork from child process 5904.
> Detaching after fork from child process 6320.
> Detaching after fork from child process 6733.
> Detaching after fork from child process 7156.
> Detaching after fork from child process 7561.
> Detaching after fork from child process 7974.
> Detaching after fork from child process 8393.
> Detaching after fork from child process 8816.
> Detaching after fork from child process 9241.
> Detaching after fork from child process 9666.
> Detaching after fork from child process 10565.
> Detaching after fork from child process 10980.
> Detaching after fork from child process 11393.
> Detaching after fork from child process 11810.
> Detaching after fork from child process 12221.
> Detaching after fork from child process 12642.
> Detaching after fork from child process 13068.
> Detaching after fork from child process 13369.
> 
> Program terminated with signal SIGALRM, Alarm clock.
> The program no longer exists.
> (gdb) where
> No stack.
> 
> Best Regards,
> Josh
> 
> On 7/19/05, mauiusers-request at supercluster.org
> <mauiusers-request at supercluster.org> wrote:
> > Send mauiusers mailing list submissions to
> >        mauiusers at supercluster.org
> > 
> > To subscribe or unsubscribe via the World Wide Web, visit
> >        http://www.supercluster.org/mailman/listinfo/mauiusers
> > or, via email, send a message with subject or body 'help' to
> >        mauiusers-request at supercluster.org
> > 
> > You can reach the person managing the list at
> >        mauiusers-owner at supercluster.org
> > 
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of mauiusers digest..."
> > 
> > 
> > Today's Topics:
> > 
> >   1. Re: Maui exit by itself (mcgregor at fnal.gov)
> >   2. Re: Maui exit by itself (Wightman)
> > 
> > 
> > ----------------------------------------------------------------------
> > 
> > Message: 1
> > Date: Sun, 17 Jul 2005 17:03:29 -0600
> > From: mcgregor at fnal.gov
> > Subject: Re: [Mauiusers] Maui exit by itself
> > To: mauiusers at supercluster.org
> > Message-ID: <a915d9607064.42da8f61 at fnal.gov>
> > Content-Type: text/plain; charset=us-ascii
> > 
> > I have seen maui exit with exactly the same error message. I would really appreciate any advice on this matter.
> > 
> > Gordon.
> > 
> > ----- Original Message -----
> > From: group hpc <hpc.group at gmail.com>
> > Date: Thursday, July 14, 2005 7:50 pm
> > Subject: [Mauiusers] Maui exit by itself
> > 
> > > Hi,
> > >
> > > I would like to find out why the maui scheduler suddenly exit by
> > > itself.Does anyone know how to resolve this problem? The following
> > > is a last
> > > log messgae before it exits.
> > >
> > > 07/14 14:11:06 MSURecvPacket(10,BufP,4,NULL,100000)
> > > 07/14 14:11:07 ServerProcessRequests()
> > > 07/14 14:11:07 INFO:     not rolling logs (6889559 < 10000000)
> > > 07/14 14:11:07 MResAdjust(NULL,0,0)
> > > 07/14 14:11:07 MStatInitializeActiveSysUsage()
> > > 07/14 14:11:07 MStatClearUsage([NONE],Active)
> > > 07/14 14:11:07 ServerUpdate()
> > > 07/14 14:11:07 MSysUpdateTime()
> > > 07/14 14:11:07 INFO:     starting iteration 1215
> > > 07/14 14:11:07 MRMGetInfo()
> > > 07/14 14:11:07 MClusterClearUsage()
> > > 07/14 14:11:07 MRMClusterQuery()
> > > 07/14 14:11:07 MPBSClusterQuery(META,RCount,SC)
> > > 07/14 14:11:07 ERROR:    cannot get node info: NULL
> > >
> > > Thanks.
> > > --
> > > Josh
> > > _______________________________________________
> > > mauiusers mailing list
> > > mauiusers at supercluster.org
> > > http://www.supercluster.org/mailman/listinfo/mauiusers
> > >
> > 
> > 
> > 
> > ------------------------------
> > 
> > Message: 2
> > Date: Mon, 18 Jul 2005 08:46:58 -0600
> > From: Wightman <wightman at clusterresources.com>
> > Subject: Re: [Mauiusers] Maui exit by itself
> > To: mauiusers at supercluster.org
> > Message-ID: <1121698018.3122.1.camel at oahu.icluster.org>
> > Content-Type: text/plain
> > 
> > Could someone who is seeing this bug please catch it under gdb and send
> > the output of the "where" command?
> > 
> > This information will be extremely useful in tracking down where the
> > failure is occurring.
> > 
> > Thanks,
> > 
> > - Douglas
> > Cluster Resources, Inc.
> > 
> > On Sun, 2005-07-17 at 17:03 -0600, mcgregor at fnal.gov wrote:
> > > I have seen maui exit with exactly the same error message. I would really appreciate any advice on this matter.
> > >
> > > Gordon.
> > >
> > > ----- Original Message -----
> > > From: group hpc <hpc.group at gmail.com>
> > > Date: Thursday, July 14, 2005 7:50 pm
> > > Subject: [Mauiusers] Maui exit by itself
> > >
> > > > Hi,
> > > >
> > > > I would like to find out why the maui scheduler suddenly exit by
> > > > itself.Does anyone know how to resolve this problem? The following
> > > > is a last
> > > > log messgae before it exits.
> > > >
> > > > 07/14 14:11:06 MSURecvPacket(10,BufP,4,NULL,100000)
> > > > 07/14 14:11:07 ServerProcessRequests()
> > > > 07/14 14:11:07 INFO:     not rolling logs (6889559 < 10000000)
> > > > 07/14 14:11:07 MResAdjust(NULL,0,0)
> > > > 07/14 14:11:07 MStatInitializeActiveSysUsage()
> > > > 07/14 14:11:07 MStatClearUsage([NONE],Active)
> > > > 07/14 14:11:07 ServerUpdate()
> > > > 07/14 14:11:07 MSysUpdateTime()
> > > > 07/14 14:11:07 INFO:     starting iteration 1215
> > > > 07/14 14:11:07 MRMGetInfo()
> > > > 07/14 14:11:07 MClusterClearUsage()
> > > > 07/14 14:11:07 MRMClusterQuery()
> > > > 07/14 14:11:07 MPBSClusterQuery(META,RCount,SC)
> > > > 07/14 14:11:07 ERROR:    cannot get node info: NULL
> > > >
> > > > Thanks.
> > > > --
> > > > Josh
> > > > _______________________________________________
> > > > mauiusers mailing list
> > > > mauiusers at supercluster.org
> > > > http://www.supercluster.org/mailman/listinfo/mauiusers
> > > >
> > >
> > > _______________________________________________
> > > mauiusers mailing list
> > > mauiusers at supercluster.org
> > > http://www.supercluster.org/mailman/listinfo/mauiusers
> > 
> > 
> > 
> > ------------------------------
> > 
> > _______________________________________________
> > mauiusers mailing list
> > mauiusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/mauiusers
> > 
> > 
> > End of mauiusers Digest, Vol 12, Issue 12
> > *****************************************
> > 
> 
> 



More information about the mauiusers mailing list