[Mauiusers] Torque causing maui to segfault
Andrus, Brian Contractor
bdandrus at nps.edu
Mon Jan 31 11:28:52 MST 2011
Thanks for the info. I'll give it a shot!
Naval Postgraduate School
From: mauiusers-bounces at supercluster.org
[mailto:mauiusers-bounces at supercluster.org] On Behalf Of Jason Williams
Sent: Monday, January 31, 2011 10:17 AM
To: mauiusers at supercluster.org
Subject: Re: [Mauiusers] Torque causing maui to segfault
Andrus, Brian Contractor wrote:
> I am using maui 3.3 which seems to be the latest.
> Brian Andrus
> ITACS/Research Computing
> Naval Postgraduate School
> Monterey, California
> -----Original Message-----
> From: mauiusers-bounces at supercluster.org
> [mailto:mauiusers-bounces at supercluster.org] On Behalf Of Jason
> Sent: Monday, January 31, 2011 10:06 AM
> To: mauiusers at supercluster.org
> Subject: Re: [Mauiusers] Torque causing maui to segfault
> Andrus, Brian Contractor wrote:
>> I am finding that maui has been segfaulting lately.
>> It does it as soon as it starts. I have tried running it in the
>> foreground with -d to watch, but no info is provided beyond
>> 'Segmentation Fault'
>> As I troubleshoot, I have discovered that if I restart pbs_server,
>> seems happy again and will run.. at least until an array job is
>> submitted. I haven't been able to test to see if there is a
>> variable about an array job that affects things. I do know an array
>> of 500 slots with nodes=1:ppn=1 does cause grief.
>> Has anyone seen this or have any ideas?
> What version of Maui are you running? The version currently in the
> subversion trunk for maui has some fixes to a few memory problems I
> found that caused mysterious segmentation faults. If you're not
> that version, I'd give it a try.
The version in subversion trunk is 3.3.1. Brian at Adaptive
Computing/Cluster Resources hasn't rolled it into a new release yet.
3.3.1 from trunk is the one with a bunch of my memory fixes in it. I've
been running it on my 170 node cluster over here for a while now with no
problems and the fixes did fix a segmentation fault problem very similar
to yours. I'd suggest that if you are comfortable doing so.
mauiusers mailing list
mauiusers at supercluster.org
More information about the mauiusers