[torqueusers] pbs_server error 15008
Brock Palen
brockp at umich.edu
Wed Apr 30 14:40:45 MDT 2008
On Apr 30, 2008, at 4:31 PM, Adrian Sevcenco wrote:
> Steve Snelgrove wrote:
>> You should do the command "momctl -d3" on the mom and see if the
>> server address is in the trusted client list. If not, you are
>> experiencing the effect of my goof in the code that is now
>> corrected in the latest snapshots.
>> The latest snapshots can be obtained at http://
>> www.clusterresources.com/downloads/torque/snapshots/.
> Hi! Thank for looking into this. i dont have momctl on the nodes
> and on the server it gives : [root at grid01 bin]# momctl -d3
> ERROR: query[0] 'diag3' failed on localhost (errno: 0:5)
> It is about version: 2.1.9 packaged by EGEE for glite(GRID) install.
> So i am thinking that is something about my configuration.
> Thank you,
> Adrian
When you run momctl from another host, you have to tell it which
host's mom to connect to. Otherwise it defaults to localhost.
On nyx559
[root at nyx559 ~]# momctl -h nyx555 -d3
Host: nyx555.engin.umich.edu/nyx555.engin.umich.edu Version: 2.1.9
<snip>
Brock Palen
>
>> Adrian Sevcenco wrote:
>>> Hi! I have an instalation of torque and i try to send some test
>>> jobs but all jobs stop with the status Deferred and i receive on
>>> the server side this type of errors :
>>> 04/30/2008 22:37:57;0040;PBS_Server;Svr;grid01.x.x;Scheduler sent
>>> command new
>>> 04/30/2008 22:37:57;0100;PBS_Server;Req;;Type ModifyJob request
>>> received from root at grid01.x.x, sock=9
>>> 04/30/2008 22:37:57;0008;PBS_Server;Job;133.grid01.x.x;Job
>>> Modified at request of root at grid01.x.x
>>> 04/30/2008 22:37:57;0100;PBS_Server;Req;;Type RunJob request
>>> received from root at grid01.x.x, sock=9
>>> 04/30/2008 22:37:57;0008;PBS_Server;Job;133.grid01.x.x;Job Run at
>>> request of root at grid01.x.x
>>> 04/30/2008 22:37:57;0008;PBS_Server;Job;133.grid01.x.x;send of
>>> job to wn02 failed error = 15008
>>> 04/30/2008 22:37:57;0001;PBS_Server;Svr;PBS_Server;Access from
>>> host not allowed, or unknown host (15008) in send_job, child
>>> failed in previous commit request for job 133.grid01.x.x
>>> 04/30/2008 22:37:57;0008;PBS_Server;Job;133.grid01.x.x;unable to
>>> run job, MOM rejected/rc=1
>>> 04/30/2008 22:37:57;0080;PBS_Server;Req;req_reject;Reject reply
>>> code=15041(Execution server rejected request MSG=cannot send job
>>> to mom, state=PRERUN), aux=0, type=RunJob, from root at grid01.x.x
>>>
>>> on the node i have in mom_log this :
>>> 1193 04/30/2008 22:18:33;0008;
>>> pbs_mom;Job;process_request;request type QueueJob from host
>>> grid01.x.x rejected (host not authorized)
>>> 1194 04/30/2008 22:18:33;0080; pbs_mom;Req;req_reject;Reject
>>> reply code=15008(Access from host not allowed, or unknown host
>>> MSG=request not authorized), aux=0, type=QueueJob, from
>>> PBS_Server at grid01.x.x
>>>
>>> I have public key identification.. i don't know how to pursue the
>>> problem.. I would appreciate any advice for finding the problem.
>>> Also i have :
>>> 1195 04/30/2008 22:19:32;0001; pbs_mom;Svr;pbs_mom;is_request,
>>> bad connect from public_ip:1023 - unauthorized server
>>> But in hosts i put only the private ip. why is not used the
>>> hostname and privet ip that i put in hosts?
>>> Thank you for any advice you can give me
>>> Best regards,
>>> Adrian
>>>
>>> -------------------------------------------------------
>>> Adrian Sevcenco - Institute of Space Sciences, Romania
>>> -------------------------------------------------------
>>> --------------------------------------------------------------------
>>> ----
>>>
>>> _______________________________________________
>>> torqueusers mailing list
>>> torqueusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
More information about the torqueusers
mailing list