[torqueusers] Getting "qsub: Job rejected by all possible destinations"

Steve Young chemadm at hamilton.edu
Thu Feb 26 18:28:36 MST 2009


Hi,
	I'll take a stab at it ... ;-). It doesn't look like a host problem  
to me it looks more like your asking for some kind of resources/queue  
that the system can't fulfill. If you look at your log it tries to run  
your job but the place your routing too doesn't have the resources you  
asked for. Does it just sit there queue'd forever? Anyhow, that's what  
it looks like to me without knowing what you asked for and how your  
queue's are set up.

-Steve

On Feb 25, 2009, at 10:41 AM, Prakash Velayutham wrote:

> Hi,
>
> Torque-2.3.6 Server and MOMs
> Torque-2.1.10 submission client
>
> I am getting "qsub: Job rejected by all possible destinations" on  
> the client side. Here are some details which are baffling me.
>
> My client name is client1.domain.com and IP is x.y.z. This entry is  
> in the DNS server.
>
> In the client's /etc/hosts file, I have an entry called
>
> x.y.z	client2.domain.com	client2
>
> This entry existed for a while and same jobs used to work before.  
> But, suddenly, since today morning at around 09:35, I started  
> getting the rejection from the Torque server. Once I remove the  
> entry in /etc/hosts, jobs go in fine.
>
> I have the client1.domain.com in Torque server's /etc/hosts.equiv  
> and the client1.domain.com in qmgr's acl_hosts too.
>
> Even now, if I add the same entry back to /etc/hosts, I get  
> rejections. I have no idea why this is happening because if I do  
> nslookup on the Torque server for the client's IP address x.y.z, I  
> get back client1.domain.com. This is baffling and disturbing.  
> Following are the relevant entries in the Torque server logs.
>
> 02/25/2009 09:35:02;0100;PBS_Server;Req;;Type AuthenticateUser  
> request received from user at client1.domain.com, sock=13
> 02/25/2009 09:35:02;0100;PBS_Server;Req;;Type QueueJob request  
> received from user at client1.domain.com, sock=10
> 02/25/2009 09:35:02;0100;PBS_Server;Req;;Type JobScript request  
> received from user at client1.domain.com, sock=10
> 02/25/2009 09:35:02;0100;PBS_Server;Req;;Type ReadyToCommit request  
> received from user at client1.domain.com, sock=10
> 02/25/2009 09:35:02;0100;PBS_Server;Req;;Type Commit request  
> received from user at client1.domain.com, sock=10
> 02/25/2009 09:35:02;0100;PBS_Server;Job; 
> 2799.bmiclustersvcd1.cchmc.org;enqueuing into routing, state 1 hop 1
> 02/25/2009 09:35:02;0008;PBS_Server;Job; 
> 2799.bmiclustersvcd1.cchmc.org;Job rejected by all possible  
> destinations
> 02/25/2009 09:35:02;0100;PBS_Server;Job; 
> 2799.bmiclustersvcd1.cchmc.org;dequeuing from routing, state QUEUED
> 02/25/2009 09:35:02;0080;PBS_Server;Req;req_reject;Reject reply  
> code=15039(Job rejected by all possible destinations), aux=0,  
> type=Commit, from user at client1.domain.com
>
> Any suggestions/ideas please?
>
> Thanks,
> Prakash
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list