[torqueusers] pbs_server failed

Ken Nielson knielson at adaptivecomputing.com
Wed Mar 30 08:47:57 MDT 2011


Lydia,

Tell us more. Which version of TORQUE are you running?

Did you use --with-high-availability when you configured TORQUE?

Do your servers use a shared file system for $TORQUEHOME?

Ken

----- Original Message -----
From: "Lydia Heck" <lydia.heck at durham.ac.uk>
To: "Torque Users Mailing List" <torqueusers at supercluster.org>
Sent: Wednesday, March 30, 2011 7:32:17 AM
Subject: [torqueusers] pbs_server failed


The pbs_server failed for no apparent reason. Although configured "High 
availability" did not work as I had forgotten to add the second server.

However there is still the queston why it failed.

When the system was finally brought back to life everything seemed to work fine,
with the exception that jobs with multi-cpu requirements are not being scheduled 
now.

If I stop both servers the pbs_mom daemons will die and take all the jobs with 
it.

Any idea what i could do short of restarting all the daemons and loosing all the 
jobs?

Best wishes,
Lydia

_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers


More information about the torqueusers mailing list