[torqueusers] problems with many job submissions

Steve Young chemadm at hamilton.edu
Thu May 14 09:45:22 MDT 2009


Hi all,
	I have been experiencing a problem with a user submitting thousands  
of jobs. Out of most of the jobs they seem to either finish in a  
matter of seconds or aren't even doing anything. I'm using torque,  
maui and gold. Now I'm using a routing queue to contain the 10,000  
jobs they submit (all single cpu jobs). The routing queue works fine  
and routes to the proper execution queue (able to run 116 at a time).  
However, I notice as the system is chewing through the jobs trying to  
execute them they drop off so fast the system is having a hard time  
trying to keep up. The mysql server goes to 100% and even a load on  
goldd. I suspect it's because the flurry of jobs starting/stopping so  
fast that creating the reservations and other record-keeping in maui/ 
gold is making this load.
	I'm hoping to get the user to make some changes to how they submit  
jobs (but they can be difficult at times). I suspect that even if the  
jobs ran for 5 minutes or so that then the system could at least keep  
up. So I'm curious to know if any others ran into this type of problem  
and what you did to solve it. Are there some changes in torque/maui/ 
gold that I could make to help alleviate this?

torque version: 2.2.1
Gold version 2.0.0.0
Maui version 3.2.6p14

( I'm expecting to upgrade all of these this summer to the latest  
stable versions)

Any suggestions are welcome.... Thanks,

-Steve




More information about the torqueusers mailing list