[torqueusers] problem with libtm under torque 4.0

Roy Dragseth roy.dragseth at cc.uit.no
Tue Mar 27 15:59:24 MDT 2012


I have just installed torque 4.0 on my test cluster and there seems to be some 
issues with pbdsh and OSC mpiexec.  Do anyone else have problems with these?

I just want to check before I dive deeper into this.

The problem I see is that if I run pbsdsh within a job

$ pbsdsh -u uname -a
pbsdsh: error from tm_poll() 17002

If I drop the -u flag it seems to work a bit better, but still get some error 
messages.

$ pbsdsh uname -a
Linux compute-0-2.local 2.6.18-308.1.1.el5 #1 SMP Wed Mar 7 04:16:51 EST 2012 
x86_64 x86_64 x86_64 GNU/Linux
pbsdsh: Event poll failed, error TM_ENOTCONNECTED
Linux compute-0-2.local 2.6.18-308.1.1.el5 #1 SMP Wed Mar 7 04:16:51 EST 2012 
x86_64 x86_64 x86_64 GNU/Linux
Linux compute-0-1.local 2.6.18-308.1.1.el5 #1 SMP Wed Mar 7 04:16:51 EST 2012 
x86_64 x86_64 x86_64 GNU/Linux
Linux compute-0-1.local 2.6.18-308.1.1.el5 #1 SMP Wed Mar 7 04:16:51 EST 2012 
x86_64 x86_64 x86_64 GNU/Linux
pbsdsh: reconnected
pbsdsh: Event poll failed, error TM_ENOTFOUND


also, pbs_mom tends to segfault when I try this.  From dmesg

pbs_mom[16801]: segfault at 0000000000000020 rip 000000000040ac36 rsp 
00007fff32754f00 error 4


Do anyone else see anything similar?  

Torque v3.0.2 do not have this problem on exact same setup.

This is on CentOS 5.8.  Torque is compiled without hwloc and I have not 
configured any cpusets.

Regards,
r.

-- 

  The Computer Center, University of Tromsø, N-9037 TROMSØ Norway.
	      phone:+47 77 64 41 07, fax:+47 77 64 41 00
        Roy Dragseth, Team Leader, High Performance Computing
	 Direct call: +47 77 64 62 56. email: roy.dragseth at uit.no


More information about the torqueusers mailing list