[torqueusers] pbs_server failed
Ken Nielson
knielson at adaptivecomputing.com
Wed Mar 30 08:47:57 MDT 2011
Lydia,
Tell us more. Which version of TORQUE are you running?
Did you use --with-high-availability when you configured TORQUE?
Do your servers use a shared file system for $TORQUEHOME?
Ken
----- Original Message -----
From: "Lydia Heck" <lydia.heck at durham.ac.uk>
To: "Torque Users Mailing List" <torqueusers at supercluster.org>
Sent: Wednesday, March 30, 2011 7:32:17 AM
Subject: [torqueusers] pbs_server failed
The pbs_server failed for no apparent reason. Although configured "High
availability" did not work as I had forgotten to add the second server.
However there is still the queston why it failed.
When the system was finally brought back to life everything seemed to work fine,
with the exception that jobs with multi-cpu requirements are not being scheduled
now.
If I stop both servers the pbs_mom daemons will die and take all the jobs with
it.
Any idea what i could do short of restarting all the daemons and loosing all the
jobs?
Best wishes,
Lydia
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
More information about the torqueusers
mailing list