[torqueusers] Torque 2.3.4 - Jobs not running
Wayne Mallett
wayne.mallett at jcu.edu.au
Tue Nov 25 16:48:25 MST 2008
On Tue, Nov 25, 2008 at 07:05:26AM +1000, Wayne Mallett alleged:
> > G'day all,
> >
> > I have recently upgraded to Torque 2.3.4 and have found jobs won't run on
> > some servers unless I direct them to with a "qrun <jobid>". Using
> > "tracejob <jobid>" on a job that wasn't forced to run, I get the following
> > output
>
> Do you have a scheduler running?
>
> Note that 2.3.5 was released last week.
Yes, I do have a scheduler (maui) running. The problem reported only occurs
on _some_ compute nodes. I recently added 33 servers to the cluster I manage,
18 of these will accept jobs, 15 won't and I'm trying to diagnose why. All
systems should be built to the same image (using XCAT). The pbs_server/maui
daemons run on a VM that has been handling jobs (various versions) for several
years now.
Thanks,
Wayne
--
Dr. Wayne Mallett
Email: Wayne.Mallet at jcu.edu.au
Smail: High Performance & Research Computing
James Cook University
Townsville Qld 4811
Phone: 0747815084
More information about the torqueusers
mailing list