[Mauiusers] Maui insists on starting on FQDN or public interface on
Rocks cluster
Steven Truong
midair77 at gmail.com
Fri Feb 15 14:41:25 MST 2008
Hi, all. I am new to Rocks but have set up a couple of Beowulf Linux
clusters with Torque and Maui.
This time we bought the new cluster from the vendor and my senior
manager wanted the vendor to save us time and have them installed
Rocks 4.3 for us.
Currently, Torque + Maui can not work together for a simple setup.
#qmgr -c "print server"
create queue default
set queue default queue_type = Execution
set queue default kill_delay = 90
set queue default enabled = True
set queue default started = True
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
set queue batch resources_default.nodes = 1
set queue batch resources_default.walltime = 01:00:00
set queue batch enabled = True
set queue batch started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_host_enable = False
################################
set server managers = root at jupiter.mydomain.com
set server operators = root at jupiter.mydomain.com
################################
set server default_queue = default
set server log_events = 511
set server mail_from = adm
set server query_other_jobs = True
set server resources_default.walltime = 336:00:00
set server scheduler_iteration = 60
set server node_ping_rate = 300
set server node_check_rate = 600
set server tcp_timeout = 6
set server node_pack = False
set server pbs_version = 2.1.8
-----
#grep -v ^# /opt/maui/maui.cfg
RMPOLLINTERVAL 00:00:3
SERVERPORT 42559
SERVERMODE NORMAL
#NOTE: the fqdn
SERVERHOST Jupiter.Mydomain.com
RMCFG[Jupiter.Mydomain.com] TYPE=PBS
ADMIN1 maui root
LOGFILE maui.log
LOGFILEMAXSIZE 10000000
LOGLEVEL 3
QUEUETIMEWEIGHT 1
BACKFILLPOLICY FIRSTFIT
RESERVATIONPOLICY CURRENTHIGHEST
NODEALLOCATIONPOLICY MINRESOURCE
ENABLEMULTINODEJOBS TRUE
ENABLEMULTIREQJOBS TRUE
----
The information on /etc/hosts are the followings and _cannot_ be changed:
############################################
#cat /etc/hosts | grep Jupiter
10.1.1.1 Jupiter.local Jupiter # originally frontend-0-0
192.168.10.181 Jupiter.Mydomain.com
############################################
The compute nodes are in 10.1.1.0/8 network.
As you can see SERVERHOST and RMCFG domain name's first letter
(Mydomain) is capitalized and if I changed it to Jupiter.mydomain.com
or jupiter.Mydomain.com or jupiter.mydomain.com or Jupiter.local or
Jupiter then I would get this type of
error:
Starting maui: ERROR: server must be started on host
'jupiter.Mydomain.com' (currently on 'Jupiter.Mydomain.com')
----
I would like to know where is this "(currently on
'Jupiter.Mydomain.com')" being set.
Currently, I could only set these 2 parameters to
Jupiter.Mydomain.com but doing so Maui would only listen to the public
interface and torque pbs can not communicate with maui to schedule job
Jupiter PBS_Server: Connection refused (111) in contact_sched, Could
not contact Scheduler - port 15004
If I modified /etc/hosts to have "10.1.1.1 Jupiter.Mydomain.com" then
torque and maui would work but after I rebooted the frontend then this
manually set entry got overwritten by the info in the database. I
used php admin interface and would like to change that value but an
engineer from the vendor told me that I should not change that entry
because that is the way rocks works.
Please advise what I should modify to make maui listen to private
interface or non FQDN so torque can communicate with it.
Thank you very much for your helps.
Steven.
More information about the mauiusers
mailing list