[torqueusers] server rejected job obit - 15008
Garrick Staples
garrick at clusterresources.com
Wed Aug 2 08:25:43 MDT 2006
On Wed, Aug 02, 2006 at 08:26:11AM +0200, Francisco Jose Bernabe Pellicer alleged:
> Hi everybody,
>
> I installed Torque/Maui in a cluster and everything was working fine.
> Torque/Maui is installed in a server (Computing Element) under a
> middleware called gLite. I moved this service (Computing Element) to a
> better server, so I had to move also the Torque/Maui to the same server.
> The thing is that I'm getting this problem from the Execution Nodes
> (mom_log):
>
> 08/02/2006 05:18:37;0080; pbs_mom;Req;req_reject;Reject reply
> code=15001, aux=0, type=18, from PBS_Server@<COMPUTING_ELEMENT>
> 08/02/2006 05:18:38;0080; pbs_mom;Req;req_reject;Reject reply
> code=15001, aux=0, type=11, from PBS_Server@<COMPUTING_ELEMENT>
> 08/02/2006 05:18:39;0080; pbs_mom;Job;238.<COMPUTING_ELEMENT>;using
> transient tmpdir /var/spool/pbs/238.<COMPUTING_ELEMENT>
> 08/02/2006 05:18:39;0008; pbs_mom;Job;238.<COMPUTING_ELEMENT>;Started,
> pid = 25091
> 08/02/2006 05:19:33;0080;
> pbs_mom;Job;238.<COMPUTING_ELEMENT>;scan_for_terminated: task 1
> terminated, sid 25091
> 08/02/2006 05:19:33;0008; pbs_mom;Job;238.<COMPUTING_ELEMENT>;Terminated
> 08/02/2006 05:19:33;0080; pbs_mom;Job;238.<COMPUTING_ELEMENT>;Removing
> transient job directory /var/spool/pbs/238.<COMPUTING_ELEMENT>
> 08/02/2006 05:19:33;0080; pbs_mom;Job;238.<COMPUTING_ELEMENT>;Obit sent
> 08/02/2006 05:19:33;0001; pbs_mom;Job;238.<COMPUTING_ELEMENT>;server
> rejected job obit - 15008
>
>
> Do you think the mom from the WNs is pointing to the old server? I think
> it uses only the name and not the IP, is it? Let me know.
pbs_mom's configuration uses hostnames, but it is only looked up once.
pbs_mom would need to be restarted if the server's IP changed.
More information about the torqueusers
mailing list