[torqueusers] rejected request

Corey Hirschman corey at rentec.com
Thu Oct 21 07:24:56 MDT 2004


I am still running 1.0.1p6.  I had tested the 1.1.0p1 version, but decided to wait a bit to upgrade since we were working and it seemed some other people had some problems with the later versions.  It sounds like it may be time now.

I did notice in my testing of 1.1.0p1 that if I killed a compute node, the server would not grind to a halt, as it would in 1.0.1px versions.  Of course this was on a tiny cluster and does not really accurately represent a real cluster.

Corey

On Thu, Oct 21, 2004 at 09:03:52AM +1000, Chris Samuel wrote:
> On Thu, 21 Oct 2004 04:53 am, Corey Hirschman wrote:
> 
> > Everything looks normal at first, Maui sees the job, checks available
> > resources, finds a node suitable to run the job on, submits the job, then
> > it gets rejected:
> >
> > maui.log.1:10/20 12:45:31 ERROR: ? ?job '192275' cannot be started: (rc:
> > 15041 errmsg: 'Execution server rejected request' ?hostlist: 'monster620')
> > maui.log.1:10/20 12:45:31 ERROR: ? ?cannot start job '192275' in partition
> > DEFAU LT
> >
> > I have looked on the node it tried to run the job on, monster620, and there
> > is no record of the job id in the MOM logs. ?It does not appear that the
> > job was every actually even sumitted to the node, so I don't know how it
> > was rejected.
> 
> Which version of Torque are you running ?
> 
> This sounds very much like the bug that was annoying a lot of folks in recent 
> versions but the SuperCluster folks believe to have been fixed with 1.1.0p3.
> 
> We've just upgraded to that release (1.1.0p3) and things look fine, although 
> the usual trigger for us (rebooting a compute node or restarting a mom that's 
> been nuked by the Linux OOM killer) hasn't happened yet..
> 
> cheers,
> Chris
> -- 
>  Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
>  Victorian Partnership for Advanced Computing http://www.vpac.org/
>  Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia
> 



> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://supercluster.org/mailman/listinfo/torqueusers

> 


More information about the torqueusers mailing list