[torqueusers] installation/configuration problem with multi-homed system. --- unauthorized host/request
ljjlp03 at gmail.com
Mon Nov 14 19:34:42 MST 2011
Thank you very much! It works!
On Mon, Nov 14, 2011 at 10:26 PM, Jason Bacon <jwbacon at tds.net> wrote:
> I had a similar issue and got around it by simply setting up /etc/hosts
> on each node properly.
> On the multihomed head node, the hostname is bound to the external IP in
> /etc/hosts. On the compute nodes, the hostname of the head node is
> bound to it's internal address. Also be sure that name resolution on
> the compute nodes is configured to check files before DNS.
> No special configuration was required within torque.
> On 11/13/11 09:48, liu junjun wrote:
> > Hi everyone,
> > I am trying to install torque-3.0.2 on a multi-homed system (two NIC
> > networks) but having an authority problem. Please read my description
> > on the problem below. Any helps are highly appreciated!
> > ---- System information ----
> > OS: Ubuntu 10.10
> > eth0: external_host_name
> > eth1: internal_host_name
> > hostname: internal_hostname
> > --------------------------------------------
> > ---- Basic Torque information ----
> > Torque version: 3.0.2
> > content of /var/spool/torque/server_name: internal_host_name
> > content of /var/spool/torque/torque.cfg: SERVERHOST internal_host_name
> > server and nodes can ping each other with internal_host_name
> > ----------------------------------------
> > ---- the problem -------------
> > 1. My first try on the installation:
> > By following the installation document at
> > I have problem with "torque.setup" script. It gave me "unauthorized
> > request". I noticed that the problem may related to my two NIC cards.
> > Then I double checked the server_name file and also added "SERVERHOST
> > interal_host_name" to torque.cfg. Unfortunately, problem sitll remains.
> > 2. My 2nd try on the installation:
> > I removed the first installation, and disabled eth0 which is
> > associated with external_host_name, and recompiled torque again with
> > the exactly same steps as that in my first try on the installation.
> > Everything seems fine. I can create a batch queue and can submit jobs
> > which run and terminate normally. However, once I enable eth0
> > (external_host_name), every qmgr command returns "unauthorized
> > request". I noticed that the server recognizes me as
> > user at external_host_name, whereas the pbs server is set as
> > internal_host_name which is also the hostname. I guess this causes the
> > "unauthorized" issue, so I made the following settings, by disabling
> > eth0 to get the authority on the operation:
> > ====
> > qmgr -c 's s acl_hosts += external_host_name'
> > qmgr -c 's s managers += root at external_host_name'
> > qmgr -c 's s operators += root at external_host_name'
> > qmgr -c 's s submit_hosts += external_host_name'
> > ====
> > After the above commands, I gain the operational access to the
> > pbs_server even when eth0 is enabled. However, all the submitted jobs
> > are still remain in the Q state. The followings are part of the 'qstat
> > -f' command and log files on the server:
> > ==== part of 'qstat -f' command =====
> > Job Id: 51.internal_host_name
> > Job_Name = STDIN
> > Job_Owner = user at exteral_host_name
> > job_state = Q
> > queue = batch
> > server = internal_host_name
> > Checkpoint = u
> > ctime = Sun Nov 13 19:25:12 2011
> > Error_Path = internal_host_name:/home/liu/STDIN.e51
> > Hold_Types = n
> > Join_Path = n
> > Keep_Files = n
> > Mail_Points = a
> > mtime = Sun Nov 13 19:25:12 2011
> > Output_Path = internal_host_name:/home/liu/STDIN.o51
> > ===============================
> > ==== part of pbs_server log ======
> > 11/13/2011 19:25:05;0002;PBS_Server;Svr;PBS_Server;Torque Server
> > Version = 3.0.2, loglevel = 0
> > 11/13/2011 19:25:12;0100;PBS_Server;Job;51.interal_host_name;enqueuing
> > into batch, state 1 hop 1
> > 11/13/2011 19:25:12;0008;PBS_Server;Job;51.interal_host_name;Job
> > Queued at request of user at external_host_name, owner =
> > user at external_host_name, job name = STDIN, queue = batch
> > 11/13/2011 19:25:12;0040;PBS_Server;Svr;cddlogin;Scheduler was sent
> > the command new
> > 11/13/2011 19:25:12;0080;PBS_Server;Req;dis_request_read;req header
> > bad, dis error 7 (Premature end of message), type=Connect
> > 11/13/2011 19:25:12;0080;PBS_Server;Req;req_reject;Reject reply
> > code=15058(Bad DIS based Request Protocol MSG=cannot decode message),
> > aux=0, type=Connect, from @
> > 11/13/2011 19:25:12;0002;PBS_Server;Req;dis_reply_write;DIS reply
> > failure, -1
> > =========================
> > ==== part of pbs_sche log ======
> > 11/13/2011 19:25:12;0001; pbs_sched;Svr;pbs_sched;LOG_ERROR::badconn,
> > external_host_name on port 762 unauthorized host
> > ==========================
> > As you can see from the above information, although exteral_host_name
> > is set as a submit_host, all jobs are still remain in 'Q' state
> > because the job owner is user at external_host_name! My question is :
> > either 1. how to make the server to accept jobs from
> > users at external_host_name?
> > or 2. how to make the server to recognize every submitted jobs as
> > belonging to user at internal_host_name?
> > Thanks in advance!
> > Junjun
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
> Jason W. Bacon
> jwbacon at tds.net
> torqueusers mailing list
> torqueusers at supercluster.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers