[torqueusers] rejected request
Corey Hirschman
corey at rentec.com
Thu Oct 21 07:24:56 MDT 2004
I am still running 1.0.1p6. I had tested the 1.1.0p1 version, but decided to wait a bit to upgrade since we were working and it seemed some other people had some problems with the later versions. It sounds like it may be time now.
I did notice in my testing of 1.1.0p1 that if I killed a compute node, the server would not grind to a halt, as it would in 1.0.1px versions. Of course this was on a tiny cluster and does not really accurately represent a real cluster.
Corey
On Thu, Oct 21, 2004 at 09:03:52AM +1000, Chris Samuel wrote:
> On Thu, 21 Oct 2004 04:53 am, Corey Hirschman wrote:
>
> > Everything looks normal at first, Maui sees the job, checks available
> > resources, finds a node suitable to run the job on, submits the job, then
> > it gets rejected:
> >
> > maui.log.1:10/20 12:45:31 ERROR: ? ?job '192275' cannot be started: (rc:
> > 15041 errmsg: 'Execution server rejected request' ?hostlist: 'monster620')
> > maui.log.1:10/20 12:45:31 ERROR: ? ?cannot start job '192275' in partition
> > DEFAU LT
> >
> > I have looked on the node it tried to run the job on, monster620, and there
> > is no record of the job id in the MOM logs. ?It does not appear that the
> > job was every actually even sumitted to the node, so I don't know how it
> > was rejected.
>
> Which version of Torque are you running ?
>
> This sounds very much like the bug that was annoying a lot of folks in recent
> versions but the SuperCluster folks believe to have been fixed with 1.1.0p3.
>
> We've just upgraded to that release (1.1.0p3) and things look fine, although
> the usual trigger for us (rebooting a compute node or restarting a mom that's
> been nuked by the Linux OOM killer) hasn't happened yet..
>
> cheers,
> Chris
> --
> Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
> Victorian Partnership for Advanced Computing http://www.vpac.org/
> Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://supercluster.org/mailman/listinfo/torqueusers
>
More information about the torqueusers
mailing list