[torqueusers] Torque 2.3.4 - Jobs not running
Wayne Mallett
wayne.mallett at jcu.edu.au
Tue Nov 25 18:15:12 MST 2008
G'day all,
I've discovered the problem - had nothing to do with the installation of
torque onto compute nodes. Ended up being configuration of queues requiring a
"properties=..." to be assigned to nodes. On 18 nodes this assignment was
correct on the other fifteen I made typos.
Thanks,
Wayne
Wayne Mallett wrote:
> On Tue, Nov 25, 2008 at 07:05:26AM +1000, Wayne Mallett alleged:
> > > G'day all,
> > >
> > > I have recently upgraded to Torque 2.3.4 and have found jobs won't
> run on
> > > some servers unless I direct them to with a "qrun <jobid>". Using
> > > "tracejob <jobid>" on a job that wasn't forced to run, I get the
> following
> > > output
> >
> > Do you have a scheduler running?
> >
> > Note that 2.3.5 was released last week.
>
> Yes, I do have a scheduler (maui) running. The problem reported only
> occurs on _some_ compute nodes. I recently added 33 servers to the
> cluster I manage, 18 of these will accept jobs, 15 won't and I'm trying
> to diagnose why. All systems should be built to the same image (using
> XCAT). The pbs_server/maui daemons run on a VM that has been handling
> jobs (various versions) for several years now.
>
> Thanks,
> Wayne
>
--
Dr. Wayne Mallett
Email: Wayne.Mallet at jcu.edu.au
Smail: High Performance & Research Computing
James Cook University
Townsville Qld 4811
Phone: 0747815084
More information about the torqueusers
mailing list