[torqueusers] Server not talking to MOMs at all

Lennart Karlsson Lennart.Karlsson at nsc.liu.se
Thu Sep 8 02:52:00 MDT 2005

Troy, you wrote:
> BTW, the $headnode terminology IMHO doesn't make sense in an environment
> where the system running pbs_server (and likely pbs_sched/maui/moab as
> well) is *NOT* a login node, although I'm not sure if that describes
> anyone's system structure other than ours.  In any case, I would argue
> that $pbsserver (or alternately $serverhost) is much more descriptive.

We also have such environments, with login nodes as separate nodes from
the Maui/PBS servers, and I support your naming alternatives.

> I don't see how you can make multi-server support work
> without a shared filesystem (with working locks, i.e. not NFS) for
> $PBS_HOME and the addition of a whole bunch of cluster membership and
> voting code to pbs_server.  It seems like overkill if what we're really
> looking for is simply a high-availability pbs_server.
> What exactly are people looking for WRT multi-server support in TORQUE,
> anyway?

The state of Torque today, in our environments, is that it behaves
badly or crashes (e.g. the current favourite: eats all available internal
and swap memory) in one way or another very frequently (I appreciate, without
measuring, that it is a factor of more than 100 times) compared to the
underlying Linux plus hardware, so we see no reason to set up two PBS servers
for one cluster. Torque is developing into a more robust software, but there
is still a long way to go.

I believe that more "bells and whistles" on Torque might make it less
available. We need to change Torque, but we have to make the right
choices and we have to be careful not to put "everything" into it.

So I would like Torque to be less complicated, not more. I vote NO to
multi-server support.

Best wishes,
-- Lennart Karlsson <Lennart.Karlsson at nsc.liu.se>
   National Supercomputer Centre in Linkoping, Sweden

More information about the torqueusers mailing list