[torqueusers] intermittent qsub failures with 4.2.7

Craig Artley cartley at hotmail.com
Fri Mar 21 15:33:49 MDT 2014


Hello, last November I inquired here about intermittent qsub failures that we see several times a day on our cluster. We were using 4.1.6, and a reply here indicated that this was a known problem that should be fixed in the (then-forthcoming) 4.2.7 release.

Today I had a chance to build the new packages and apply them to all of the nodes as well as the server. (That went very well, by the way. I stopped the queues, let them drain out, refreshed and restarted everything, and the jobs started releasing again. Very nice!)

However, I still got a couple of these qsub failures in a batch of 700+ jobs.

Exit code = 196
Error: qsub: submit error (Invalid request MSG=cannot locate new job 1320429.h2 (0 - Success))

So, do others see this? Am I missing some other configuration detail?

  -craig
 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20140321/c6840208/attachment.html 


More information about the torqueusers mailing list