[torqueusers] Maui dies immediately on job submission
garrick at clusterresources.com
Tue Feb 27 12:24:44 MST 2007
On Tue, Feb 27, 2007 at 02:24:53PM -0500, Lippert, Kenneth B. alleged:
> I know others have reported this same or similar problems, but I cannot
> find a solution that works for me in the archives.
> Simple problem, I start pbs_mom, pbs_server, and Maui. All is well.
> "pbsnodes -a" shows what it should. All the queues, etc are set up as
> they should be.
> As soon as I submit any sort of job the Maui process dies leaving no
> clue in it's log file. The last entries are:
> INFO job '9' successfully started
> /var/log/messages says that Maui seg-faulted.
> I get no stdout or stderr from the job itself, it doesn't appear that it
> actually started running anywhere. I only have two nodes active at the
> moment, the head node where pbs_server (and pbs_mom) are running, and
> one other with just pbs_mom.
> I have seen references to Maui dieing immediately if there is a
> mis-match between what Maui thinks the host's name is and what uname
> returns, but I have triple checked that, and that is not the problem.
> Thank you for any direction you can give.
> ps. This is a brand new Redhat Enterprise release 4 install with the
> latest Torque/Maui.
You are overcomplicating things :)
The probably is that maui is segfaulting and never gets to start a job.
We just need a backtrace of maui segfaulting in gdb.
More information about the torqueusers