[torqueusers] Torque 2.3.4 - Jobs not running

Wayne Mallett wayne.mallett at jcu.edu.au
Tue Nov 25 18:15:12 MST 2008


G'day all,

I've discovered the problem - had nothing to do with the installation of 
torque onto compute nodes.  Ended up being configuration of queues requiring a 
"properties=..." to be assigned to nodes.  On 18 nodes this assignment was 
correct on the other fifteen I made typos.

Thanks,
Wayne

Wayne Mallett wrote:
> On Tue, Nov 25, 2008 at 07:05:26AM +1000, Wayne Mallett alleged:
>  > > G'day all,
>  > >
>  > > I have recently upgraded to Torque 2.3.4 and have found jobs won't 
> run on
>  > > some servers unless I direct them to with a "qrun <jobid>".   Using
>  > > "tracejob <jobid>" on a job that wasn't forced to run, I get the 
> following
>  > > output
>  >
>  > Do you have a scheduler running?
>  >
>  > Note that 2.3.5 was released last week.
> 
> Yes, I do have a scheduler (maui) running.  The problem reported only 
> occurs on _some_ compute nodes.  I recently added 33 servers to the 
> cluster I manage, 18 of these will accept jobs, 15 won't and I'm trying 
> to diagnose why.  All systems should be built to the same image (using 
> XCAT).  The pbs_server/maui daemons run on a VM that has been handling 
> jobs (various versions) for several years now.
> 
> Thanks,
> Wayne
> 

-- 
Dr. Wayne Mallett
Email:	Wayne.Mallet at jcu.edu.au
Smail:	High Performance & Research Computing
	James Cook University
	Townsville  Qld 4811
Phone:	0747815084


More information about the torqueusers mailing list