[torqueusers] No Permission. qstat: cannot connect to server torque-server (errno=15007) Unauthorized Request

Aleksandr Levchuk alevchuk at gmail.com
Thu Mar 31 09:12:53 MDT 2011


I solve the problem:
The server was jammed because of a dead node.

Some details are here:
http://serverfault.com/questions/253932/torque-works-half-of-the-time-fails-no-permission-the-other-half

Alex

On Thu, Mar 31, 2011 at 7:43 AM, Ken Nielson
<knielson at adaptivecomputing.com> wrote:
> On 03/30/2011 07:08 PM, Aleksandr Levchuk wrote:
>> Dear Torque Experts,
>>
>> We upgraded our OS from Debian 5 to Debian 6 and consequently upgraded Torque.
>>
>> Now qstat and qsub works for about 1 minute and fails for another minute.
>>
>> I have torque-2.5.5 (but I tried 2.4.8 and it had same issues).
>>
>>
>> When we run qstat half of the time it works and half of the time we get:
>> ===============================
>> pbs_iff: cannot read reply from pbs_server
>> No Permission.
>> qstat: cannot connect to server torque-server (errno=15007) Unauthorized Request
>> ===============================
>>
>> On the mom syslog
>> ===============================
>> pbs_mom: LOG_ERROR::Operation now in progress (115) in
>> TMomFinalizeChild, cannot open interactive qsub socket to host
>> girkelab-3.ucr.edu:51056 - 'cannot connect to port 777 in
>> client_to_svr - errno:115 Operation now in progress' - check routing
>> tables/multi-homed host issues
>>
>> On the server
>> ===============================
>> /opt/torque-2.5.5/bin/qmgr -c 'print server'
>> #
>> # Create queues and set their attributes.
>> #
>> #
>> # Create and define queue batch
>> #
>> create queue batch
>> set queue batch queue_type = Execution
>> set queue batch resources_default.nodes = 1
>> set queue batch enabled = True
>> set queue batch started = True
>> #
>> # Set server attributes.
>> #
>> set server scheduling = True
>> set server acl_hosts = torque-server
>> set server acl_hosts += torque-server+biocluster+parrot+owl
>> set server acl_hosts += owl-33+biocluster-33
>> set server acl_hosts += girkelab-3+girkelab-4
>> set server operators = root at torque-server
>> set server default_queue = batch
>> set server log_events = 511
>> set server mail_from = adm
>> set server query_other_jobs = True
>> set server scheduler_iteration = 600
>> set server node_check_rate = 150
>> set server tcp_timeout = 6
>> set server log_level = 0
>> set server submit_hosts = biocluster+parrot+owl
>> set server submit_hosts += girkelab-3+girkelab-4
>> set server submit_hosts += owl-33+biocluster-33
>> set server allow_node_submit = True
>> set server next_job_number = 206082
>> ===============================
>>
>> Why does it say permission error when it works half of the time?
>>
>> What can I do to diagnose the problem?
>>
>>
>> Alex
>>
> Alex,
>
> pbs_iff tries to contact pbs_server as root to vouch for a user
> connection made just prior to its call. The connection from pbs_iff to
> pbs_server has been made. The PBS_BATCH_AuthenUser request was sent but
> no response was sent back from the server. That is why the error
> occurred. You may want to set the log level to 6 on the server and see
> if the request arrives at the server.
>
> Ken Nielson
> Adaptive Computing
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>



-- 
----------------------------------------------------------------
Aleksandr Levchuk
Bioinformatics Systems and Databases
http://facility.bioinformatics.ucr.edu/people/aleksandr-levchuk

Cell Phone: (951) 368-0004
Lab Phone: (951) 905-5232

Institute for Integrative Genome Biology
University of California, Riverside
---------------------------------------------------------------


More information about the torqueusers mailing list