[torqueusers] server rejected job obit - 15008

Garrick Staples garrick at clusterresources.com
Wed Aug 2 08:25:43 MDT 2006


On Wed, Aug 02, 2006 at 08:26:11AM +0200, Francisco Jose Bernabe Pellicer alleged:
> Hi everybody,
> 
> I installed Torque/Maui in a cluster and everything was working fine. 
> Torque/Maui is installed in a server (Computing Element) under a 
> middleware called gLite. I moved this service (Computing Element) to a 
> better server, so I had to move also the Torque/Maui to the same server. 
> The thing is that I'm getting this problem from the Execution Nodes 
> (mom_log):
> 
> 08/02/2006 05:18:37;0080;   pbs_mom;Req;req_reject;Reject reply 
> code=15001, aux=0, type=18, from PBS_Server@<COMPUTING_ELEMENT>
> 08/02/2006 05:18:38;0080;   pbs_mom;Req;req_reject;Reject reply 
> code=15001, aux=0, type=11, from PBS_Server@<COMPUTING_ELEMENT>
> 08/02/2006 05:18:39;0080;   pbs_mom;Job;238.<COMPUTING_ELEMENT>;using 
> transient tmpdir /var/spool/pbs/238.<COMPUTING_ELEMENT>
> 08/02/2006 05:18:39;0008;   pbs_mom;Job;238.<COMPUTING_ELEMENT>;Started, 
> pid = 25091
> 08/02/2006 05:19:33;0080;   
> pbs_mom;Job;238.<COMPUTING_ELEMENT>;scan_for_terminated: task 1 
> terminated, sid 25091
> 08/02/2006 05:19:33;0008;   pbs_mom;Job;238.<COMPUTING_ELEMENT>;Terminated
> 08/02/2006 05:19:33;0080;   pbs_mom;Job;238.<COMPUTING_ELEMENT>;Removing 
> transient job directory /var/spool/pbs/238.<COMPUTING_ELEMENT>
> 08/02/2006 05:19:33;0080;   pbs_mom;Job;238.<COMPUTING_ELEMENT>;Obit sent
> 08/02/2006 05:19:33;0001;   pbs_mom;Job;238.<COMPUTING_ELEMENT>;server 
> rejected job obit - 15008
> 
> 
> Do you think the mom from the WNs is pointing to the old server? I think 
> it uses only the name and not the IP, is it? Let me know.

pbs_mom's configuration uses hostnames, but it is only looked up once.
pbs_mom would need to be restarted if the server's IP changed.



More information about the torqueusers mailing list