[Mauiusers] Maui insists on starting on FQDN or public interface on Rocks cluster

Steven Truong midair77 at gmail.com
Fri Feb 15 14:41:25 MST 2008


Hi, all.  I am new to Rocks but have set up a couple of Beowulf Linux
clusters with Torque and Maui.

This time we bought the new cluster from the vendor and my senior
manager wanted the vendor to save us time and have them installed
Rocks 4.3 for us.

Currently, Torque + Maui can not work together for a simple setup.

#qmgr -c "print server"
create queue default
set queue default queue_type = Execution
set queue default kill_delay = 90
set queue default enabled = True
set queue default started = True
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
set queue batch resources_default.nodes = 1
set queue batch resources_default.walltime = 01:00:00
set queue batch enabled = True
set queue batch started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_host_enable = False

################################
set server managers = root at jupiter.mydomain.com
set server operators = root at jupiter.mydomain.com
################################
set server default_queue = default
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server resources_default.walltime = 336:00:00
set server scheduler_iteration = 60
set server node_ping_rate = 300
set server node_check_rate = 600
set server tcp_timeout = 6
set server node_pack = False
set server pbs_version = 2.1.8


-----
#grep -v ^# /opt/maui/maui.cfg
RMPOLLINTERVAL          00:00:3
SERVERPORT              42559
SERVERMODE              NORMAL

#NOTE: the fqdn

SERVERHOST              Jupiter.Mydomain.com
RMCFG[Jupiter.Mydomain.com]             TYPE=PBS

ADMIN1                maui root
LOGFILE               maui.log
LOGFILEMAXSIZE        10000000
LOGLEVEL              3
QUEUETIMEWEIGHT       1
BACKFILLPOLICY        FIRSTFIT
RESERVATIONPOLICY     CURRENTHIGHEST
NODEALLOCATIONPOLICY  MINRESOURCE
ENABLEMULTINODEJOBS  TRUE
ENABLEMULTIREQJOBS  TRUE

----
The information on /etc/hosts are the followings and _cannot_ be changed:

############################################
#cat /etc/hosts | grep Jupiter
10.1.1.1        Jupiter.local Jupiter # originally frontend-0-0
192.168.10.181   Jupiter.Mydomain.com

############################################

The compute nodes are in 10.1.1.0/8 network.


As you can see SERVERHOST and RMCFG domain name's first letter
(Mydomain) is capitalized and if I changed it to Jupiter.mydomain.com
or jupiter.Mydomain.com or jupiter.mydomain.com  or Jupiter.local or
Jupiter then I would get this type of
error:

Starting maui: ERROR:    server must be started on host
'jupiter.Mydomain.com' (currently on 'Jupiter.Mydomain.com')

----
I would like to know where is this "(currently on
'Jupiter.Mydomain.com')" being set.

Currently, I could only set these 2 parameters  to
Jupiter.Mydomain.com but doing so Maui would only listen to the public
interface and torque pbs can not communicate with maui to schedule job

Jupiter PBS_Server: Connection refused (111) in contact_sched, Could
not contact Scheduler - port 15004

If I modified /etc/hosts to have "10.1.1.1 Jupiter.Mydomain.com" then
torque and maui would work but after I rebooted the frontend then this
manually set entry got overwritten by the info in the database.  I
used php admin interface and would like to change that value but an
engineer from the vendor told me that I should not change that entry
because that is the way rocks works.

Please advise what I should modify to make maui listen to private
interface or non FQDN so torque can communicate with it.

Thank you very much for your helps.
Steven.


More information about the mauiusers mailing list