[torqueusers] Problems with qmgr
Thomas Dargel
td at chemie.hu-berlin.de
Wed Aug 29 07:06:50 MDT 2007
What is the output of
hostname
and
hostname -f
Do you have entries for the IP-addresses/hostnames of 'head' and 'wilma' in
/etc/hosts??
Greets,
Thomas.
Saurabh Barve wrote:
>>> Due to a dual-NIC setup, there is a conflict between the hostname on the
>>> internal network (head: 172.16.100.1) and the one on the external network
>>> (wilma: 172.20.*.*). As a result the 'qmgr' command returns error messages
>>> for my commands.
>>>
>> Is 'wilma' (172.20.*.*) associated with the first network-device (eth0) and
>> 'head' (172.16.100.1) with the second (e.g. eth1)?
>
>
> No. It is the other way round. The external IP address ('wilma':172.20.*) is
> associated with eth1 and the internal IP address ('head': 172.16.*) is
> associated with eth0.
>
>> Then you should try to start the pbs_server with this extension:
>>
>> #> pbs_server -S wilma:15004
> The 'pbsnodes -a' command gives me reasonable output:
> ==========
> [root at wilma ~]# pbsnodes -a
> head
> state = free
> np = 8
> ntype = cluster
> status = opsys=linux,uname=Linux wilma 2.6.9-55.ELlargesmp #1 SMP Wed
> ...
> ...
> ==========
>
>> Please, don't forget to restart 'maui' after you started 'pbs_server',
>> sometimes this solves a strange behaviour ...
>
> I tried to change the SERVERHOST variable in maui.cfg to 'head', but then
> the maui service wouldn't start:
>
> ==========
> [root at wilma ~]# service maui start
> Starting maui: ERROR: server must be started on host 'head' (currently on
> 'wilma.<snipped>')
> [FAILED]
> ==========
>
>
> I still get errors when I try to use 'head' in the qmgr commands:
> ----------------------
> Qmgr: set server tcp_timeout=5
> qmgr obj= svr=default: Unauthorized Request
> Qmgr: set head tcp_timeout=5
> qmgr: Illegal object type: head.
> Qmgr: set server head tcp_timeout=5
> qmgr obj=head svr=head: Unauthorized Request
> ---------------------
>
>
> But using 'wilma' seems to work:
> ---------------------
> Qmgr: set server wilma tcp_timeout=5
> Qmgr: list server
> <snipped>
> Server wilma
> server_state = Active
> scheduling = True
> <snipped>
> tcp_timeout = 5
> pbs_version = 2.1.2
> ...
> ...
> Qmgr: create queue mtm at wilma
> Qmgr: list queue
> Queue mtm
> total_jobs = 0
> state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:0
> Exiting:0
> mtime = Tue Aug 28 11:04:45 2007
> ---------------------
>
> So I restarted maui by resetting SERVERHOST to 'wilma'. It started without
> errors.
>
> But once I quit 'qmgr', the queue information is not saved. I set active the
> default 'batch' queue, but my qsub based job wouldn't run. When I went back
> into qmgr, no active queues are displayed. There does not seem to be a
> 'save' option for 'qmgr'.
>
> -Saurabh
More information about the torqueusers
mailing list