[torqueusers] problem with libtm under torque 4.0
Roy Dragseth
roy.dragseth at cc.uit.no
Tue Mar 27 15:59:24 MDT 2012
I have just installed torque 4.0 on my test cluster and there seems to be some
issues with pbdsh and OSC mpiexec. Do anyone else have problems with these?
I just want to check before I dive deeper into this.
The problem I see is that if I run pbsdsh within a job
$ pbsdsh -u uname -a
pbsdsh: error from tm_poll() 17002
If I drop the -u flag it seems to work a bit better, but still get some error
messages.
$ pbsdsh uname -a
Linux compute-0-2.local 2.6.18-308.1.1.el5 #1 SMP Wed Mar 7 04:16:51 EST 2012
x86_64 x86_64 x86_64 GNU/Linux
pbsdsh: Event poll failed, error TM_ENOTCONNECTED
Linux compute-0-2.local 2.6.18-308.1.1.el5 #1 SMP Wed Mar 7 04:16:51 EST 2012
x86_64 x86_64 x86_64 GNU/Linux
Linux compute-0-1.local 2.6.18-308.1.1.el5 #1 SMP Wed Mar 7 04:16:51 EST 2012
x86_64 x86_64 x86_64 GNU/Linux
Linux compute-0-1.local 2.6.18-308.1.1.el5 #1 SMP Wed Mar 7 04:16:51 EST 2012
x86_64 x86_64 x86_64 GNU/Linux
pbsdsh: reconnected
pbsdsh: Event poll failed, error TM_ENOTFOUND
also, pbs_mom tends to segfault when I try this. From dmesg
pbs_mom[16801]: segfault at 0000000000000020 rip 000000000040ac36 rsp
00007fff32754f00 error 4
Do anyone else see anything similar?
Torque v3.0.2 do not have this problem on exact same setup.
This is on CentOS 5.8. Torque is compiled without hwloc and I have not
configured any cpusets.
Regards,
r.
--
The Computer Center, University of Tromsø, N-9037 TROMSØ Norway.
phone:+47 77 64 41 07, fax:+47 77 64 41 00
Roy Dragseth, Team Leader, High Performance Computing
Direct call: +47 77 64 62 56. email: roy.dragseth at uit.no
More information about the torqueusers
mailing list