[torqueusers] pbs_server failed

Lydia Heck lydia.heck at durham.ac.uk
Wed Mar 30 09:07:33 MDT 2011


Hi Ken,

version: 3.0.0

Unless --with-high-availability  is the default no. But is still seems to work 
with it.

Yes I am using a shared filesystem (gpfs).

Best wishes,
Lydia



On Wed, 30 Mar 2011, Ken Nielson wrote:

> Lydia,
>
> Tell us more. Which version of TORQUE are you running?
>
> Did you use --with-high-availability when you configured TORQUE?
>
> Do your servers use a shared file system for $TORQUEHOME?
>
> Ken
>
> ----- Original Message -----
> From: "Lydia Heck" <lydia.heck at durham.ac.uk>
> To: "Torque Users Mailing List" <torqueusers at supercluster.org>
> Sent: Wednesday, March 30, 2011 7:32:17 AM
> Subject: [torqueusers] pbs_server failed
>
>
> The pbs_server failed for no apparent reason. Although configured "High
> availability" did not work as I had forgotten to add the second server.
>
> However there is still the queston why it failed.
>
> When the system was finally brought back to life everything seemed to work fine,
> with the exception that jobs with multi-cpu requirements are not being scheduled
> now.
>
> If I stop both servers the pbs_mom daemons will die and take all the jobs with
> it.
>
> Any idea what i could do short of restarting all the daemons and loosing all the
> jobs?
>
> Best wishes,
> Lydia
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>


More information about the torqueusers mailing list