[torqueusers] Adding to a cluster

Jeremy Mann jeremy at biochem.uthscsa.edu
Wed Sep 24 15:37:36 MDT 2008


Wayne Mallett wrote:
> G'day All,
>
> All nodes were visible.  I checked torque and maui configurations and
> then restarted torque and maui again.  What I noticed in and around all
> the checks I made was that qmgr wasn't reporting the correct amount of
> resources used - using qstat to compare with report using qmgr.  After
> the restart of torque and maui I discovered another ~350 new jobs (on
> top of the ~150 running) showed up on the system - we only have ~400 CPUs.
>
> My guess is that the system was having trouble catching up with an
> influx of jobs - coming from a user's script.  I've asked the user to
> put a sleep command between job submissions in the script.  I restarted
> torque several times during my investigations.  It wasn't until maui was
> restarted that the problem showed itself.

Wayne, we had this same problem a few months ago. The solution was to
create a routing queue just for this user. Basically what it does it hold
all of his jobs (nearly 100k) and keeps the default queue around 1000
jobs.


-- 
Jeremy Mann
jeremy at biochem.uthscsa.edu

University of Texas Health Science Center
Bioinformatics Core Facility
http://www.bioinformatics.uthscsa.edu
Phone: (210) 567-2672



More information about the torqueusers mailing list