[torqueusers] Torque-HA resource manager integration
prakash.velayutham at cchmc.org
Wed Jul 30 09:59:41 MDT 2008
On Jul 30, 2008, at 10:09 AM, Josh Butikofer wrote:
> Michael and Michael,
> The configuration below that Michael Robbert gave will work for
> communicating with a TORQUE in HA mode, but not for the reasons that
> Mr. Robbert assumed.
> The TORQUE libraries have access to the server_name file found in
> TORQUE's configuration/spool directory. This file contains the
> primary and secondary TORQUE servers. What this means is Moab can
> communicate with the TORQUE libraries and the libraries will resolve
> which server it should communicate with (depending on which one is
> currently running with an open socket). Moab does not even have to
> know that TORQUE is running in HA mode--it should just work.
> In other words, you should only need a single RMCFG line
> configuring a single TORQUE RM, as is shown in Mr. Robbert's
> moab.cfg file. The RMCFG lines do indeed control both submission
> and data querying from TORQUE (and other resource managers).
> As for the SCHEDCFG line, these parameters only affect the Moab
> scheduler configuration. The additional FBSERVER=s02.local:42559 is
> telling Moab that a secondary Moab Workload Manager daemon is
> running on s02.local, port 42559. This config only controls Moab's
> HA, not TORQUE HA.
> Hopefully that makes sense.
> >> From Michael Sternberg:
> >> I also have Linux-HA working on the torque-HA node pair, and could
> >> provide a shared IP for the scheduler to talk to. However, as
> >> Linux-HA and pbs_server use different time constants and
> mechanisms to
> >> trigger failover, this can only lead to a mess when the service
> >> locations are incoherent.
> If you want to use Linux-HA (which I think is a fine idea), you
> should probably disable TORQUE's HA mechanism. As you mentioned,
> using them both together is messy. I would only use one or the other.
For some reason, my repeated attempts at using Linux-HA for providing
Torque HA ability never succeeded. There was some issue with name
resolution which failed.
If indeed this is supposed to work and does work, how do you think
Moab HA would work with this Torque setup?
> Josh Butikofer
> Michael Robbert wrote:
>> We have a similar setup. Here are the lines that we have in our
>> Moab config that appear to be relevant.
>> SCHEDCFG[clustername] SERVER=s01.domain.edu:42559
>> FBSERVER=s02.local:42559 MODE=NORMAL
>> RMCFG[base] TYPE=PBS
>> RMCFG[base] SUBMITCMD=/opt/torque/bin/qsub
>> So, it looks like RMCFG is only used to submit jobs and SCHEDCFG is
>> used to get data back from Torque.
>> Good luck,
>> Mike Robbert, Colorado School of Mines
> torqueusers mailing list
> torqueusers at supercluster.org
Programmer / Analyst
Cincinnati Children's Hospital Medical Center
More information about the torqueusers