[torqueusers] Jobs not terminating

Hristo Iliev hristo at mc.phys.uni-sofia.bg
Wed Mar 29 09:50:56 MST 2006


On Wed, 2006-03-29 at 11:05 -0500, Tom Combs wrote:
> Hi,  I just upgraded to torque-2.0.0.p8 and now jobs do not terminate nor
> can they be qdel'd.  In the mom_logs on the nodes, I have the following:
> 
>  pbs_mom;Req;jobobit;No contact with server at hostaddr c000000a, port 15000
> 
> I have hostbased authentication working for all users between the master 
> node and
> compute nodes - in both directions but that doesn't appear to be the 
> issue. Jobs go
> into execution and seem to run just fine, it's just the pbs job never 
> terminates.
> 
> Does anyone know what my problem could be?
> 
> TIA,  Tom Combs
> 

Hi.

Recently we experienced the same problem after moving to 2.0.0p8 and the
reason turned out to be poorly set up /etc/hosts file. On each node the
node's hostname first appeared on the line where localhost (127.0.0.1)
was. Strange enough but this setup worked quite well with Torque
1.2.0p6.

Hristo Iliev



More information about the torqueusers mailing list