[torqueusers] pbs_mom's child exits immediately after starting
pbs_mom
Daniel Andrzejewski
andrzeje at cs.utk.edu
Mon Sep 29 09:22:00 MDT 2008
Hi All,
First of all, I tried to look for some information on the web, but couldn't find
anything related to my problem.
I am upgrading from Debian 3.1 to CentOS 5.2 on a cluster with 64bit processors.
There was no problem running torque 2.1.6. There is no problem running torque
2.3.3 on a 32bit CentOS, but there is a problem with 64bit CentOS.
I simply cannot start pbs_mom. It actually starts, but spawns a child which
immediately exits, so there's no pbs_mom on the compute nodes running. I decided
to take one machine, install just torque on it and investigate, but I cannot
find any logs of why pbs_mom's child exits.
The following is couple last lines of 'strace pbs_mom':
fcntl(4, F_SETLK, {type=F_UNLCK, whence=SEEK_SET, start=0, len=0}) = 0
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x2b9b22f05db0) = 24415
--- SIGCHLD (Child exited) @ 0 (0) ---
exit_group(0) = ?
I went to troubleshooting section of torque documentation - 10.1.5 Using GDB to
Locate Failures. When I export the environment variable PBSDEBUG=yes and start
gdb with pbs_mom it runs fine and doesn't show any problems:
[root at frodo9 ~]# gdb pbs_mom
GNU gdb Red Hat Linux (6.5-37.el5rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db
library "/lib64/libthread_db.so.1".
(gdb) run
Starting program: /usr/local/sbin/pbs_mom
MOM is up
The problem with using PBSDEBUG=yes (or --enable-debug flag while configuration)
is that pbs_mom doesn't go to background.
These are the options I use while configuring torque:
./configure --prefix=/pkgs/torque-2.1.6
--with-server-home=/sw/var/torque
--with-pam=/lib64/security
--with-scp
--with-default-server=frodo9.sinrg.local
--with-sendmail=/usr/sbin/sendmail
--enable-syslog
--disable-rpp
--disable-gui
--disable-gcc-warnings
Maybe the following information could help. Iexported PBSDEBUG=yes and I started
pbs_mom in one window and went to another window and ran some diagnostics:
[root at frodo9 ~]# pbsnodes
frodo9.sinrg.local
state = free
np = 2
ntype = cluster
status = opsys=linux,uname=Linux frodo9 2.6.18-92.el5 #1 SMP Tue Jun 10
18:51:06 EDT 2008
x86_64,sessions=23738,nsessions=1,nusers=1,idletime=54,totmem=4156776kb,availmem=4001240kb,physmem=2059632kb,ncpus=2,loadave=0.00,gres=server:frodo9.sinrg.local,netload=59948748,state=free,jobs=?
0,rectime=1222697670
[root at frodo9 ~]# momctl -d 3
Host: frodo9/frodo9.sinrg.local Version: 2.1.6
Server[0]: frodo9 (172.16.0.9)
Init Msgs Received: 1 hellos/1 cluster-addrs
Init Msgs Sent: 70 hellos
Last Msg From Server: 61 seconds (CLUSTER_ADDRS)
Last Msg To Server: 15 seconds
PID: 24809
HomeDirectory: /sw/var/torque/mom_priv
MOM active: 145 seconds
Server Update Interval: 45 seconds
LOGLEVEL: 0 (use SIGUSR1/SIGUSR2 to adjust)
Communication Model: TCP
NOTE: no prolog configured
Alarm Time: 0 of 10 seconds
Trusted Client List: 172.16.0.9,127.0.0.1
Configured to use /usr/bin/scp -rpB
NOTE: no local jobs detected
diagnostics complete
The above torque is 2.1.6 version, but it shouldn't matter since it behaves the
same way in 2.3.3.
I also tried to compile torque 2.3.3 with 'export CFLAGS=-m32' but this didn't
fix the problem.
andrzeje:frodo9 /export/src/torque-2.3.3> file /pkgs/torque-2.3.3/sbin/pbs_mom
/pkgs/torque-2.3.3/sbin/pbs_mom: ELF 32-bit LSB executable, Intel 80386, version
1 (SYSV), for GNU/Linux 2.6.9, dynamically linked (uses shared libs), for
GNU/Linux 2.6.9, not stripped
Please advise!
Thanks,
Daniel
--
Daniel Andrzejewski
student IT Administrator
Elec Engr & Comp Science
University of Tennessee
(865) 974 - 4388 (work)
"Investment in knowledge always pays the best interest" Benjamin Franklin
--
More information about the torqueusers
mailing list