[torqueusers] intermittent qsub failures

David Beer dbeer at adaptivecomputing.com
Wed Nov 20 12:10:02 MST 2013


What version are you getting this error on? We had a related fix recently.


On Tue, Nov 19, 2013 at 7:20 PM, Craig Artley <cartley at hotmail.com> wrote:

> I am seeing intermittent qsub failures. It seems to be related to load ---
> several hundred jobs submitted. Every once in a while, qsub fails with
> "Unknown Job Id Error" or "can not locate new job":
>
>     Exit code = 153
>     Error: qsub: submit error (Unknown Job Id Error)
>
>     Exit code = 196
>     Error: qsub: submit error (Invalid request MSG=can not locate new job
> 630254.h2 (0 - Success))
>
> In the server log, I find messages like these:
>
> 11/19/2013 01:16:42;0080;PBS_Server.27108;Job;625027.h2;Unknown Job Id
> Error
>
> 11/19/2013 01:16:42;0080;PBS_Server.27108;Req;req_reject;Reject reply
> code=15001(Unknown Job Id Error MSG=cannot locate job), aux=0,
> type=DeleteJob, from joeuser at g4
>
>
> 11/19/2013
> 14:41:44;0001;PBS_Server.29564;Svr;PBS_Server;LOG_ERROR::Invalid request
> (15004) in req_jobscript, can not locate new job 630254.h2 (0 - Success)
> 11/19/2013 14:41:44;0100;PBS_Server.27141;Job;630253.h2;enqueuing into
> parallel, state 1 hop 1
> 11/19/2013 14:41:44;0080;PBS_Server.29564;Req;req_reject;Reject reply
> code=15004(Invalid request MSG=can not locate new job 630254.h2 (0 -
> Success)), aux=0, type=JobScript, from joeuser at g4
>
> So far I haven't found anything helpful. Please let me know if you have
> idea what's going on.
>
> By the way, we were having lots of problems with Torque and NFS, but after
> configuring torque as recommended in
> http://www.supercluster.org/pipermail/torqueusers/2011-March/012425.html,
> those problems went away and our reliability improved dramatically. Now all
> that remains are the two occasional problems above.
>
>   -craig
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>


-- 
David Beer | Senior Software Engineer
Adaptive Computing
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20131120/8d620bb7/attachment-0001.html 


More information about the torqueusers mailing list