[torqueusers] installation/configuration problem with multi-homed system. --- unauthorized host/request

liu junjun ljjlp03 at gmail.com
Sun Nov 13 08:48:29 MST 2011


Hi everyone,

I am trying to install torque-3.0.2 on a multi-homed system (two NIC
networks) but having an authority problem. Please read my description on
the problem below. Any helps are highly appreciated!

---- System information ----
OS: Ubuntu 10.10
eth0: external_host_name
eth1: internal_host_name
hostname: internal_hostname
--------------------------------------------

---- Basic Torque information ----
Torque version: 3.0.2
content of /var/spool/torque/server_name: internal_host_name
content of /var/spool/torque/torque.cfg: SERVERHOST internal_host_name

server and nodes can ping each other with internal_host_name
----------------------------------------


---- the problem -------------
1. My first try on the installation:
By following the installation document at
http://www.adaptivecomputing.com/resources/docs/torque/1.1installation.php,
I have problem with "torque.setup" script. It gave me "unauthorized
request". I noticed that the problem may related to my two NIC cards. Then
I double checked the server_name file and also added "SERVERHOST
interal_host_name" to torque.cfg. Unfortunately, problem sitll remains.

2. My 2nd try on the installation:
I removed the first installation, and disabled eth0 which is associated
with external_host_name, and recompiled torque again with the exactly same
steps as that in my first try on the installation. Everything seems fine. I
can create a batch queue and can submit jobs which run and terminate
normally. However, once I enable eth0 (external_host_name), every qmgr
command returns "unauthorized request". I noticed that the server
recognizes me as user at external_host_name, whereas the pbs server is set as
internal_host_name which is also the hostname. I guess this causes the
"unauthorized" issue, so I made the following settings, by disabling eth0
to get the authority on the operation:
====
qmgr -c 's s acl_hosts += external_host_name'
qmgr -c 's s managers += root at external_host_name'
qmgr -c 's s operators += root at external_host_name'
qmgr -c 's s submit_hosts += external_host_name'
====

After the above commands, I gain the operational access to the pbs_server
even when eth0 is enabled. However,  all the submitted jobs are still
remain in the Q state. The followings are part of the 'qstat -f' command
and log files on the server:
==== part of 'qstat -f' command =====
Job Id: 51.internal_host_name
    Job_Name = STDIN
    Job_Owner = user at exteral_host_name
    job_state = Q
    queue = batch
    server = internal_host_name
    Checkpoint = u
    ctime = Sun Nov 13 19:25:12 2011
    Error_Path = internal_host_name:/home/liu/STDIN.e51
    Hold_Types = n
    Join_Path = n
    Keep_Files = n
    Mail_Points = a
    mtime = Sun Nov 13 19:25:12 2011
    Output_Path = internal_host_name:/home/liu/STDIN.o51
===============================

==== part of pbs_server log ======
11/13/2011 19:25:05;0002;PBS_Server;Svr;PBS_Server;Torque Server Version =
3.0.2, loglevel = 0
11/13/2011 19:25:12;0100;PBS_Server;Job;51.interal_host_name;enqueuing into
batch, state 1 hop 1
11/13/2011 19:25:12;0008;PBS_Server;Job;51.interal_host_name;Job Queued at
request of user at external_host_name, owner = user at external_host_name, job
name = STDIN, queue = batch
11/13/2011 19:25:12;0040;PBS_Server;Svr;cddlogin;Scheduler was sent the
command new
11/13/2011 19:25:12;0080;PBS_Server;Req;dis_request_read;req header bad,
dis error 7 (Premature end of message), type=Connect
11/13/2011 19:25:12;0080;PBS_Server;Req;req_reject;Reject reply
code=15058(Bad DIS based Request Protocol MSG=cannot decode message),
aux=0, type=Connect, from @
11/13/2011 19:25:12;0002;PBS_Server;Req;dis_reply_write;DIS reply failure,
-1
=========================

==== part of pbs_sche log ======
11/13/2011 19:25:12;0001; pbs_sched;Svr;pbs_sched;LOG_ERROR::badconn,
external_host_name on port 762 unauthorized host
==========================

As you can see from the above information, although exteral_host_name is
set as a submit_host, all jobs are still remain in 'Q' state because the
job owner is user at external_host_name! My question is :
either 1. how to make the server to accept jobs from
users at external_host_name?
or 2. how to make the server to recognize every submitted jobs as belonging
to user at internal_host_name?

Thanks in advance!

Junjun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20111113/89e3d7a7/attachment.html 


More information about the torqueusers mailing list