[torqueusers] Problems with qmgr
Saurabh Barve
sbarve at nps.edu
Wed Aug 29 08:53:02 MDT 2007
[root at wilma ~]# hostname
wilma
[root at wilma ~]# hostname -f
wilma.uc.nps.edu
I have these entries in the /etc/hosts file:
-----------
172.20.56.228 wilma.uc.nps.edu wilma
172.16.100.1 head
-----------
Would it help if I switched the network interfaces so that the external IP
is on eth0 and the internal IP is on eth1?
Wouldn't changing the 'hostname' and 'domainname' to "head" break my NIS/YP
services?
Thanks,
Saurabh
--
Saurabh Barve
sbarve at nps.edu
831-656-3396
> From: Thomas Dargel <td at chemie.hu-berlin.de>
> Date: Wed, 29 Aug 2007 15:06:50 +0200
> To: Saurabh Barve <sbarve at nps.edu>, <torqueusers at supercluster.org>
> Subject: Re: [torqueusers] Problems with qmgr
>
> What is the output of
>
> hostname
>
> and
>
> hostname -f
>
> Do you have entries for the IP-addresses/hostnames of 'head' and 'wilma' in
> /etc/hosts??
>
> Greets,
>
> Thomas.
>
> Saurabh Barve wrote:
>>>> Due to a dual-NIC setup, there is a conflict between the hostname on the
>>>> internal network (head: 172.16.100.1) and the one on the external network
>>>> (wilma: 172.20.*.*). As a result the 'qmgr' command returns error messages
>>>> for my commands.
>>>>
>>> Is 'wilma' (172.20.*.*) associated with the first network-device (eth0) and
>>> 'head' (172.16.100.1) with the second (e.g. eth1)?
>>
>>
>> No. It is the other way round. The external IP address ('wilma':172.20.*) is
>> associated with eth1 and the internal IP address ('head': 172.16.*) is
>> associated with eth0.
>>
>>> Then you should try to start the pbs_server with this extension:
>>>
>>> #> pbs_server -S wilma:15004
>> The 'pbsnodes -a' command gives me reasonable output:
>> ==========
>> [root at wilma ~]# pbsnodes -a
>> head
>> state = free
>> np = 8
>> ntype = cluster
>> status = opsys=linux,uname=Linux wilma 2.6.9-55.ELlargesmp #1 SMP Wed
>> ...
>> ...
>> ==========
>>
>>> Please, don't forget to restart 'maui' after you started 'pbs_server',
>>> sometimes this solves a strange behaviour ...
>>
>> I tried to change the SERVERHOST variable in maui.cfg to 'head', but then
>> the maui service wouldn't start:
>>
>> ==========
>> [root at wilma ~]# service maui start
>> Starting maui: ERROR: server must be started on host 'head' (currently on
>> 'wilma.<snipped>')
>> [FAILED]
>> ==========
>>
>>
>> I still get errors when I try to use 'head' in the qmgr commands:
>> ----------------------
>> Qmgr: set server tcp_timeout=5
>> qmgr obj= svr=default: Unauthorized Request
>> Qmgr: set head tcp_timeout=5
>> qmgr: Illegal object type: head.
>> Qmgr: set server head tcp_timeout=5
>> qmgr obj=head svr=head: Unauthorized Request
>> ---------------------
>>
>>
>> But using 'wilma' seems to work:
>> ---------------------
>> Qmgr: set server wilma tcp_timeout=5
>> Qmgr: list server
>> <snipped>
>> Server wilma
>> server_state = Active
>> scheduling = True
>> <snipped>
>> tcp_timeout = 5
>> pbs_version = 2.1.2
>> ...
>> ...
>> Qmgr: create queue mtm at wilma
>> Qmgr: list queue
>> Queue mtm
>> total_jobs = 0
>> state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0
>> Exiting:0
>> mtime = Tue Aug 28 11:04:45 2007
>> ---------------------
>>
>> So I restarted maui by resetting SERVERHOST to 'wilma'. It started without
>> errors.
>>
>> But once I quit 'qmgr', the queue information is not saved. I set active the
>> default 'batch' queue, but my qsub based job wouldn't run. When I went back
>> into qmgr, no active queues are displayed. There does not seem to be a
>> 'save' option for 'qmgr'.
>>
>> -Saurabh
More information about the torqueusers
mailing list