[torqueusers] jobids and server names?

Roy Dragseth roy.dragseth at cc.uit.no
Wed Dec 5 13:15:14 MST 2007


On Saturday 22 September 2007, Roy Dragseth wrote:
> On Friday 21 September 2007, Garrick Staples wrote:
> > On Fri, Sep 21, 2007 at 09:44:19AM +0200, Roy Dragseth alleged:
> > > Hi.
> > >
> > > Is it possible to change the name string that gets attached to the
> > > jobid number to anything else than the name of the server running
> > > pbs_server?
> > >
> > > I want to set up a cluster with login nodes and hide the real frontend
> > > from the users in the following way:
> > >
> > > Public name: my_cluster.domain.org
> > > Login node 1: my_login1.domain.org
> > > Login node 2: my_login2.domain.org
> > > Core node: my_cluster_core.domain.org
> > >
> > > my_login1 and my_login2 shall have some ip takeover mechanism for the
> > > address associated with my_cluster.domain.org.
> > >
> > > pbs_server runs on my_cluster_core, and through the standard config all
> > > jobs would have jobids like 12345.my_cluster_core.domain.org.
> > > Is it possible to make the jobids look like 12345.my_cluster.domain.org
> > > instead?
> >
> > That's the server_name server attribute that you can set with qmgr.
>
> Thanks, that did the trick.
>
> The server_name seems to be restricted to valid hostnames, any reason for
> that?  Does it have any meaning or is it just a string attached to the
> jobnumber?
>

Setting the server_name seems to create problems when one wants to do use the 
jobid for something. For instance, try query a job with qstat -f.  Using the 
example above:

I have set server_name = my_cluster.domain.org (which is an alias for the 
login nodes that use ip takeover), the pbs_server is running on 
my_cluster_core.domain.org.

[root at my_cluster_core named]# qstat -a

my_cluster.domain.org:
                                                                   Req'd  
Req'd   Elap
Job ID               Username Queue    Jobname    SessID NDS   TSK Memory Time  
S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
31.my_cluster.domain.org     royd     default  STDIN         --      
1  --    --  02:46 R   --
[root at my_cluster_core named]# qstat -f 31
qstat: Unknown Job Id 31.my_cluster_core.domain.org
[root at my_cluster_core named]# qstat -f 31 at my_cluster_core.domain.org
qstat: Unknown Job Id 31.my_cluster_core.domain.org
[root at my_cluster_core named]# qstat -f 31.my_cluster.domain.org
Connection refused
qstat: cannot connect to server my_cluster.domain.org (errno=111)

However using jobid at real-pbs-server-name works:

qstat -f 31.my_cluster.domain.org at my_cluster_core.domain.org

gives the desired result, but I do not want to expose my users to this.

Any thoughts on how to fix this?

r.
-- 

  The Computer Center, University of Tromsø, N-9037 TROMSØ Norway.
              phone:+47 77 64 41 07, fax:+47 77 64 41 00
     Roy Dragseth, Team Leader, High Performance Computing
         Direct call: +47 77 64 62 56. email: royd at cc.domain.org


More information about the torqueusers mailing list