[torqueusers] Problems with qmgr

Saurabh Barve sbarve at nps.edu
Wed Aug 29 08:53:02 MDT 2007


[root at wilma ~]# hostname
wilma
[root at wilma ~]# hostname -f
wilma.uc.nps.edu

I have these entries in the /etc/hosts file:
-----------
172.20.56.228   wilma.uc.nps.edu        wilma
172.16.100.1    head
-----------

Would it help if I switched the network interfaces so that the external IP
is on eth0 and the internal IP is on eth1?

Wouldn't changing the 'hostname' and 'domainname' to "head" break my NIS/YP
services?

Thanks,
Saurabh
-- 
Saurabh Barve
sbarve at nps.edu
831-656-3396




> From: Thomas Dargel <td at chemie.hu-berlin.de>
> Date: Wed, 29 Aug 2007 15:06:50 +0200
> To: Saurabh Barve <sbarve at nps.edu>, <torqueusers at supercluster.org>
> Subject: Re: [torqueusers] Problems with qmgr
> 
> What is the output of
> 
>    hostname
> 
> and
> 
>    hostname -f
> 
> Do you have entries for the IP-addresses/hostnames of 'head' and 'wilma' in
> /etc/hosts??
> 
> Greets,
> 
> Thomas.
> 
> Saurabh Barve wrote:
>>>> Due to a dual-NIC setup, there is a conflict between the hostname on the
>>>> internal network (head: 172.16.100.1) and the one on the external network
>>>> (wilma: 172.20.*.*). As a result the 'qmgr' command returns error messages
>>>> for my commands.
>>>> 
>>> Is 'wilma' (172.20.*.*) associated with the first network-device (eth0) and
>>> 'head' (172.16.100.1) with the second (e.g. eth1)?
>> 
>> 
>> No. It is the other way round. The external IP address ('wilma':172.20.*) is
>> associated with eth1 and the internal IP address ('head': 172.16.*) is
>> associated with eth0.
>>  
>>> Then you should try to start the pbs_server with this extension:
>>> 
>>> #> pbs_server -S wilma:15004
>> The 'pbsnodes -a' command gives me reasonable output:
>> ==========
>> [root at wilma ~]# pbsnodes -a
>> head
>>      state = free
>>      np = 8
>>      ntype = cluster
>>      status = opsys=linux,uname=Linux wilma 2.6.9-55.ELlargesmp #1 SMP Wed
>> ...
>> ...
>> ========== 
>> 
>>> Please, don't forget to restart 'maui' after you started 'pbs_server',
>>> sometimes this solves a strange behaviour ...
>> 
>> I tried to change the SERVERHOST variable in maui.cfg to 'head', but then
>> the maui service wouldn't start:
>> 
>> ==========
>> [root at wilma ~]# service maui start
>> Starting maui: ERROR:    server must be started on host 'head' (currently on
>> 'wilma.<snipped>')
>>                                                            [FAILED]
>> ==========
>> 
>> 
>> I still get errors when I try to use 'head' in the qmgr commands:
>> ----------------------
>> Qmgr: set server tcp_timeout=5
>> qmgr obj= svr=default: Unauthorized Request
>> Qmgr: set head tcp_timeout=5
>> qmgr: Illegal object type: head.
>> Qmgr: set server head tcp_timeout=5
>> qmgr obj=head svr=head: Unauthorized Request
>> ---------------------
>> 
>> 
>> But using 'wilma' seems to work:
>> ---------------------
>> Qmgr: set server wilma tcp_timeout=5
>> Qmgr: list server
>> <snipped>
>> Server wilma
>>         server_state = Active
>>         scheduling = True
>>         <snipped>
>>         tcp_timeout = 5
>>         pbs_version = 2.1.2
>> ...
>> ...
>> Qmgr: create queue mtm at wilma
>> Qmgr: list queue
>> Queue mtm
>>         total_jobs = 0
>>         state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0
>> Exiting:0 
>>         mtime = Tue Aug 28 11:04:45 2007
>> ---------------------
>> 
>> So I restarted maui by resetting SERVERHOST to 'wilma'. It started without
>> errors.
>> 
>> But once I quit 'qmgr', the queue information is not saved. I set active the
>> default 'batch' queue, but my qsub based job wouldn't run. When I went back
>> into qmgr, no active queues are displayed. There does not seem to be a
>> 'save' option for 'qmgr'.
>> 
>> -Saurabh



More information about the torqueusers mailing list