[torqueusers] Cluster has enough CPUs but job refuses to start

Jeremy Mann jeremy at biochem.uthscsa.edu
Tue Mar 15 15:49:54 MDT 2011

I recently added 4 additional nodes to our cluster and now queued jobs
refuse to start and say the requested number of procs in partition DEFAULT
has been exceeded.

 3 Active Jobs      96 of  144 Processors Active (66.67%)
                        12 of   20 Nodes Active      (60.00%)

Total Jobs: 16   Active Jobs: 3   Idle Jobs: 0   Blocked Jobs: 13

The 13 "blocked" jobs are requesting a mix 32 and 16 cpus. Obviously we
have enough CPUs left (96 out of 144 used), so I do not have a clue why
these 13 jobs remain blocked. I tried using qalter to lower the amount
requested, for example:

qalter -l nodes=2:ppn=4 4832

But jobid 4832 still says:

Holds:    Defer
Messages:  exceeds available partition procs
PE:  8.00  StartPriority:  119
cannot select job 4832 for partition DEFAULT (job hold active)

Jeremy Mann
jeremy at biochem.uthscsa.edu

University of Texas Health Science Center
Bioinformatics Core Facility
Phone: (210) 767-3419

More information about the torqueusers mailing list