[torqueusers] segfaulting pbs_moms: torque-2.3.6-2cri.x86_64

Garrick Staples garrick at usc.edu
Thu Nov 5 12:29:55 MST 2009


I used a mix of 32bit and 64bit pbs_moms for years.  It was never a problem.

This is just another bug in the 2.3.x line.  The 2.1.x line is stable.

On Thu, Nov 05, 2009 at 11:24:02AM -0500, Tom Pierce alleged:
> Dear Douglas,
> 
> I had mixed 32 bit moms and 64 bit moms and it did not work well.  I
> recovered by switching to a full 32 bit setup for Torque both pbs and
> moms.  Later when the full architecture was 64 bit I moved up to 64
> bit everywhere.
> 
> my two cents.
> 
> Tom
> 
> On Wed, Nov 4, 2009 at 4:50 AM, Douglas McNab <d.mcnab at physics.gla.ac.uk> wrote:
> > Hi,
> >
> > I have an issue with segfaulting mom's that seems correlated with the is
> > server trying to ping it's moms.
> > The server are version is torque-2.3.6-2cri.x86_64
> > We are currently supporting two OS's through the same batch system using
> > submit filter and node properties.   Therefore, we have two different
> > versions of moms.
> > Nodes 1->295 have moms torque-2.3.6-2cri.x86_64 and 296->309 have moms
> > torque-2.1.9-4cri.slc4.i386
> >
> > When the moms segfault we see that the torque-2.1.9 moms stay up and only
> > the torque-2.3.6 moms all die.
> >
> > I ran one of them through GDB and can see the call stack:
> >
> > Program received signal SIGSEGV, Segmentation fault.
> > 0x000000000041813f in ?? ()
> > (gdb) where
> > #0  0x000000000041813f in ?? ()
> > #1  0x000000000041985e in ?? ()
> > #2  0x0000000000419a70 in ?? ()
> > #3  0x0000000000416b97 in close_conn ()
> > #4  0x0000000000416c52 in close_conn ()
> > #5  0x00002b12d6cd7488 in wait_request () from /usr/lib64/libtorque.so.2
> > #6  0x0000000000416e1d in close_conn ()
> > #7  0x00000000004170e1 in close_conn ()
> > #8  0x00002b12d6f2b974 in __libc_start_main () from /lib64/libc.so.6
> > #9  0x0000000000405eb9 in close_conn ()
> > #10 0x00007fff7565e368 in ?? ()
> > #11 0x0000000000000000 in ?? ()
> >
> > Unfortunately this doesn't really give me any clues.
> > Does anyone have any other ideas?
> >
> > Cheers,
> >
> > Dug
> >
> > --
> > ScotGrid, Room 481, Kelvin Building, University of Glasgow
> >
> >
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
> >
> >
> 
> 
> 
> -- 
> -----------------------
> Thanks
> 
> Tom
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

-- 
Garrick Staples, GNU/Linux HPCC SysAdmin
University of Southern California

Life is Good!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20091105/02f849c1/attachment.bin 


More information about the torqueusers mailing list