[torqueusers] pbs_mom dies on exit of interactive session

DuChene, StevenX A stevenx.a.duchene at intel.com
Fri Apr 27 22:29:17 MDT 2012


I don't suppose you have any idea why I am having tm connect problems in general do you?

Or any ideas about what I could look at?
--
Steven DuChene

From: torqueusers-bounces at supercluster.org [mailto:torqueusers-bounces at supercluster.org] On Behalf Of Ken Nielson
Sent: Friday, April 27, 2012 9:23 PM
To: Torque Users Mailing List
Cc: Brady Kimball; David Hill; Ryan Chabot
Subject: Re: [torqueusers] pbs_mom dies on exit of interactive session

On Fri, Apr 27, 2012 at 9:21 PM, DuChene, StevenX A <stevenx.a.duchene at intel.com<mailto:stevenx.a.duchene at intel.com>> wrote:
I am running torque-4.0.1 that I pulled from the svn 4.0.1 branch just today.
Earlier today I was running the 4.0-fixes tree from 04/03 and I had the same results.
I was hoping the update to current sources would fix these problems but no such luck.

If I run the following:

qsub -I -l nodes=7 -l arch=atomN570

from my pbs job submission host I get:

qsub: waiting for job 4.login2.sep.here to start
qsub: job 4.login2.sep.here ready

and then I get a shell prompt on the node 0 of this job.

If I then do:

$ echo $PBS_NODEFILE
/var/spool/torque/aux//4.login2.sep.here

And then:

$ cat /var/spool/torque/aux//4.login2.sep.here
atom255
atom255
atom255
atom255
atom254
atom254
atom254

and then I try:

$ pbsdsh -h atom254 ls /tmp
pbsdsh: error from tm_poll() 17002

Alternatively if I use the -v option it says:

$ pbsdsh -v -h atom254 /bin/ls /tmp
pbsdsh: tm_init failed, rc = TM_ESYSTEM (17000)


Steve,

I am able to reproduce the SIGABRT on the MOM. We will get this fixed. Thanks for the help.

Ken
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20120428/f749a065/attachment.html 


More information about the torqueusers mailing list