[torqueusers] No Permission. qstat: cannot connect to server torque-server (errno=15007) Unauthorized Request
knielson at adaptivecomputing.com
Thu Mar 31 08:43:32 MDT 2011
On 03/30/2011 07:08 PM, Aleksandr Levchuk wrote:
> Dear Torque Experts,
> We upgraded our OS from Debian 5 to Debian 6 and consequently upgraded Torque.
> Now qstat and qsub works for about 1 minute and fails for another minute.
> I have torque-2.5.5 (but I tried 2.4.8 and it had same issues).
> When we run qstat half of the time it works and half of the time we get:
> pbs_iff: cannot read reply from pbs_server
> No Permission.
> qstat: cannot connect to server torque-server (errno=15007) Unauthorized Request
> On the mom syslog
> pbs_mom: LOG_ERROR::Operation now in progress (115) in
> TMomFinalizeChild, cannot open interactive qsub socket to host
> girkelab-3.ucr.edu:51056 - 'cannot connect to port 777 in
> client_to_svr - errno:115 Operation now in progress' - check routing
> tables/multi-homed host issues
> On the server
> /opt/torque-2.5.5/bin/qmgr -c 'print server'
> # Create queues and set their attributes.
> # Create and define queue batch
> create queue batch
> set queue batch queue_type = Execution
> set queue batch resources_default.nodes = 1
> set queue batch enabled = True
> set queue batch started = True
> # Set server attributes.
> set server scheduling = True
> set server acl_hosts = torque-server
> set server acl_hosts += torque-server+biocluster+parrot+owl
> set server acl_hosts += owl-33+biocluster-33
> set server acl_hosts += girkelab-3+girkelab-4
> set server operators = root at torque-server
> set server default_queue = batch
> set server log_events = 511
> set server mail_from = adm
> set server query_other_jobs = True
> set server scheduler_iteration = 600
> set server node_check_rate = 150
> set server tcp_timeout = 6
> set server log_level = 0
> set server submit_hosts = biocluster+parrot+owl
> set server submit_hosts += girkelab-3+girkelab-4
> set server submit_hosts += owl-33+biocluster-33
> set server allow_node_submit = True
> set server next_job_number = 206082
> Why does it say permission error when it works half of the time?
> What can I do to diagnose the problem?
pbs_iff tries to contact pbs_server as root to vouch for a user
connection made just prior to its call. The connection from pbs_iff to
pbs_server has been made. The PBS_BATCH_AuthenUser request was sent but
no response was sent back from the server. That is why the error
occurred. You may want to set the log level to 6 on the server and see
if the request arrives at the server.
More information about the torqueusers