[torqueusers] qstat error 43776
Woods, David M. Dr.
woodsdm2 at muohio.edu
Thu Aug 9 09:41:13 MDT 2007
I have a user running an application that submits a large (2000+) number of additional jobs and then periodically checks one of these jobs using qstat. If it can't get the status, the parent jobs exits with an error. In this process, qstat occasionally returns 43776 - how can I find out what this error means?
I'm guessing that this error is telling me that qstat couldn't get the status of the specific job in a reasonable time. With 2000+ jobs queued and running I notice that qstat can take a while to return the job information. Any suggestions on configuration changes that would help improve the performance of qstat?
We're using Torque 2.1.6 with Maui 3.2.6p17 running on a cluster with 128 dual CPU nodes.
More information about the torqueusers