[torqueusers] rejected request
csamuel at vpac.org
Wed Oct 20 17:03:52 MDT 2004
On Thu, 21 Oct 2004 04:53 am, Corey Hirschman wrote:
> Everything looks normal at first, Maui sees the job, checks available
> resources, finds a node suitable to run the job on, submits the job, then
> it gets rejected:
> maui.log.1:10/20 12:45:31 ERROR: job '192275' cannot be started: (rc:
> 15041 errmsg: 'Execution server rejected request' hostlist: 'monster620')
> maui.log.1:10/20 12:45:31 ERROR: cannot start job '192275' in partition
> DEFAU LT
> I have looked on the node it tried to run the job on, monster620, and there
> is no record of the job id in the MOM logs. It does not appear that the
> job was every actually even sumitted to the node, so I don't know how it
> was rejected.
Which version of Torque are you running ?
This sounds very much like the bug that was annoying a lot of folks in recent
versions but the SuperCluster folks believe to have been fixed with 1.1.0p3.
We've just upgraded to that release (1.1.0p3) and things look fine, although
the usual trigger for us (rebooting a compute node or restarting a mom that's
been nuked by the Linux OOM killer) hasn't happened yet..
Christopher Samuel - (03)9925 4751 - VPAC Systems & Network Admin
Victorian Partnership for Advanced Computing http://www.vpac.org/
Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20041021/378f0cee/attachment.bin
More information about the torqueusers