[torqueusers] intermittent qsub failures with 4.2.7

Craig Artley cartley at hotmail.com
Fri Mar 21 16:26:50 MDT 2014


Matt, thanks for the idea. What value are you using? Looks like I have 5 for clientretry, and 0 for qsubsleep.

QSUBSLEEP     0
CLIENTRETRY   5

  -craig

> From: msbritt at umich.edu
> To: torqueusers at supercluster.org
> Date: Fri, 21 Mar 2014 18:17:56 -0400
> Subject: Re: [torqueusers] intermittent qsub failures with 4.2.7
> 
> Hi Craig - I don't believe that was the error syntax I remember, but we 
> did have an issue w/ rapid qsub failures.   Using 'clientretry', we no 
> longer have had a problem with this (failures are just retried, so no 
> attempted submission fails):
> 
> http://docs.adaptivecomputing.com/torque/4-2-6/help.htm#topics/12-appendices/torque.cfgConfigFile.htm?Highlight=clientretry
> 
> In case that helps....
> 
>     - Matt
> 
> --------------------------------------------
> Matthew Britt
> CAEN HPC Group - College of Engineering
> msbritt at umich.edu
> 
> 
> On 21 Mar 2014, at 17:33, Craig Artley wrote:
> 
> > Hello, last November I inquired here about intermittent qsub failures 
> > that we see several times a day on our cluster. We were using 4.1.6, 
> > and a reply here indicated that this was a known problem that should 
> > be fixed in the (then-forthcoming) 4.2.7 release.
> >
> > Today I had a chance to build the new packages and apply them to all 
> > of the nodes as well as the server. (That went very well, by the way. 
> > I stopped the queues, let them drain out, refreshed and restarted 
> > everything, and the jobs started releasing again. Very nice!)
> >
> > However, I still got a couple of these qsub failures in a batch of 
> > 700+ jobs.
> >
> > Exit code = 196
> > Error: qsub: submit error (Invalid request MSG=cannot locate new job 
> > 1320429.h2 (0 - Success))
> >
> > So, do others see this? Am I missing some other configuration detail?
> >
> > -craig
> > 		 	   		_______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20140321/d89f0c3f/attachment.html 


More information about the torqueusers mailing list