[torquedev] Problem with TM interface in Torque 2.1.0p0

garrick at speculation.org garrick at speculation.org
Fri May 19 15:07:42 MDT 2006

On Fri, May 19, 2006 at 09:01:38AM -0400, Brock Palen alleged:
> Im jumping in the middle of this,  I represent the owners of teh PPC  
> cluster thats having the problem with TM and torque-2.1.0p0,  The  
> machine previously ran PBSPro, but we have been switching everything  
> (Linux OSX) to Torque+Moab  from PBSPro+Maui,  Bellow is the  
> requested information,

So you have some of your PPC nodes running Linux and some running OSX?
The problem happens on Linux, OSX, or both?  I have 0 experience with
Linux on PPC, but I do have some OSX boxes to play with over here.

> aon:~ root# /home/software/torque-2.1.0p0/sbin/momctl -d 4 -h aon038
> Host: aon038.engin.umich.edu/aon038.engin.umich.edu   Version: 2.1.0p0
> job[407.aon.engin.umich.edu]  state=RUNNING  sidlist=24059
> Assigned CPU Count:     2

Job in running state, with 2 allocated CPUs.  Good..

>     exec_host = aon038/1+aon038/0

exec_host is set, that's good.

So pbs_mom has the right info, but for some reason the nodelist isn't
getting passed back to TM clients.

>From reading the code, I still looks like that error message can only
come from a failed calloc(), but that isn't a reasonable precondition
given that you have no other complaints about your system.

I'll poke at an OSX box here.

