[torqueusers] Getting "qsub: Job rejected by all possible destinations"

Prakash Velayutham prakash.velayutham at cchmc.org
Wed Feb 25 08:41:05 MST 2009


Hi,

Torque-2.3.6 Server and MOMs
Torque-2.1.10 submission client

I am getting "qsub: Job rejected by all possible destinations" on the  
client side. Here are some details which are baffling me.

My client name is client1.domain.com and IP is x.y.z. This entry is in  
the DNS server.

In the client's /etc/hosts file, I have an entry called

x.y.z	client2.domain.com	client2

This entry existed for a while and same jobs used to work before. But,  
suddenly, since today morning at around 09:35, I started getting the  
rejection from the Torque server. Once I remove the entry in /etc/ 
hosts, jobs go in fine.

I have the client1.domain.com in Torque server's /etc/hosts.equiv and  
the client1.domain.com in qmgr's acl_hosts too.

Even now, if I add the same entry back to /etc/hosts, I get  
rejections. I have no idea why this is happening because if I do  
nslookup on the Torque server for the client's IP address x.y.z, I get  
back client1.domain.com. This is baffling and disturbing. Following  
are the relevant entries in the Torque server logs.

02/25/2009 09:35:02;0100;PBS_Server;Req;;Type AuthenticateUser request  
received from user at client1.domain.com, sock=13
02/25/2009 09:35:02;0100;PBS_Server;Req;;Type QueueJob request  
received from user at client1.domain.com, sock=10
02/25/2009 09:35:02;0100;PBS_Server;Req;;Type JobScript request  
received from user at client1.domain.com, sock=10
02/25/2009 09:35:02;0100;PBS_Server;Req;;Type ReadyToCommit request  
received from user at client1.domain.com, sock=10
02/25/2009 09:35:02;0100;PBS_Server;Req;;Type Commit request received  
from user at client1.domain.com, sock=10
02/25/2009 09:35:02;0100;PBS_Server;Job; 
2799.bmiclustersvcd1.cchmc.org;enqueuing into routing, state 1 hop 1
02/25/2009 09:35:02;0008;PBS_Server;Job; 
2799.bmiclustersvcd1.cchmc.org;Job rejected by all possible destinations
02/25/2009 09:35:02;0100;PBS_Server;Job; 
2799.bmiclustersvcd1.cchmc.org;dequeuing from routing, state QUEUED
02/25/2009 09:35:02;0080;PBS_Server;Req;req_reject;Reject reply  
code=15039(Job rejected by all possible destinations), aux=0,  
type=Commit, from user at client1.domain.com

Any suggestions/ideas please?

Thanks,
Prakash


More information about the torqueusers mailing list