[torqueusers] Re: error in pbs_iff: cannot read reply from pbs_server

Guilherme Menegon Arantes garantes at iq.usp.br
Thu Sep 18 14:23:47 MDT 2008


> On Wed, Sep 17, 2008 at 03:38:38PM -0300, Guilherme Menegon Arantes wrote:
> > 
> > Dear Torque users,
> > 
> > My Torque installation works fine, but when I submitted a large amount
> > of jobs in a row (say more than 10 or 15), I get the following error 
> > message:
> > 
> > pbs_iff: cannot read reply from pbs_server
> > No Permission.
> > qsub: cannot connect to server node5 (errno=15007)
> > 
> > where node5 is my Torque server. This error is seen both for qsub,
> > qstat or pbsnodes, everytime a large amount of jobs is submitted. 
> > Checking the server logs, I see errors like:
> > 
> > 09/17/2008 09:58:33;0080;PBS_Server;Req;req_reject;Reject reply code=15019(Invalid credential MSG=cannot authenticate user), aux=0, type=AuthenticateUser, from garantes at node5.full_server_name
> > 
> > where the server full domain name was not copied here, but is shown 
> > in the logs. I am running Torque 2.3.0 and this error is seen when
> > either default pbs_sched or Maui (3.2.6p19) are running as Schedulers. 
> 
> If you haven't figured this out already, check the permissions on
> pbs_iff
> on all your systems.  Make sure that it has the setuid bit set.


The setuid bit was already set. Thanks anyway.

Another data: this behaviour is not reproduceable everytime. For
instance, I was able to submmit 100 jobs in a row, without problem
today... Strange.

Let me know if anyone needs further logs or info about my cluster.

Kind regards,

G

--

Guilherme Menegon Arantes, PhD       São Paulo, Brasil
______________________________________________________



More information about the torqueusers mailing list