[Mauiusers] Maui stop by itself ... (NIS)

Miles O'Neal meo at intrinsity.com
Sun Jan 20 08:34:52 MST 2008

vaibhav agrawal said...

|Have u tried the solution of enabling nscd on all the nodes?

I would note that in general, torque seems
to require a good, solid, fast NIS setup (as
does PBS).  We ended up revamping our entire
NIS setup when we upgraded and expanded the
simfarm and desktops.

This included more, faster NIS servers, a
lot of network tweaking (which helped torque
and maui in other ways as well) and running
nscd with passwd, group and hosts.

Someone here is currently experimenting with
running without nscd, as we had other problems
when we ran it on all nodes.[1]  It can help, but
the key things for us seem to be the overall
NIS and infrastructure issues, running nscd
on the torque/maui server, and serving hosts
through NIS and caching that in nscd on the
T/M server.

So maybe running it everywhere was more
masking things than helping, per se.  But
torque and/or maui seem to insist on trying
to look up hosts via NIS (yes, we played with
nsswitch.conf a lot), so we end up using nscd
on the server.

Finally, we did implement a cron job to check
to restart maui if it dies.  Because we still
get unexplained deaths every blue moon.

Miles O'Neal
Manager, NSA
meo at intrinsity.com
30° 18' 39N, 97° 55' 1W

[1] This may have just been a problem with the
    versions we tried, but we don't know that.

