[torqueusers] Adding to a cluster

Jeremy Mann jeremy at biochem.uthscsa.edu
Wed Sep 24 15:37:36 MDT 2008

Wayne Mallett wrote:
> G'day All,
> All nodes were visible.  I checked torque and maui configurations and
> then restarted torque and maui again.  What I noticed in and around all
> the checks I made was that qmgr wasn't reporting the correct amount of
> resources used - using qstat to compare with report using qmgr.  After
> the restart of torque and maui I discovered another ~350 new jobs (on
> top of the ~150 running) showed up on the system - we only have ~400 CPUs.
> My guess is that the system was having trouble catching up with an
> influx of jobs - coming from a user's script.  I've asked the user to
> put a sleep command between job submissions in the script.  I restarted
> torque several times during my investigations.  It wasn't until maui was
> restarted that the problem showed itself.

Wayne, we had this same problem a few months ago. The solution was to
create a routing queue just for this user. Basically what it does it hold
all of his jobs (nearly 100k) and keeps the default queue around 1000

Jeremy Mann
jeremy at biochem.uthscsa.edu

University of Texas Health Science Center
Bioinformatics Core Facility
Phone: (210) 567-2672

More information about the torqueusers mailing list