[torqueusers] Cluster has enough CPUs but job refuses to start
Jeremy Mann
jeremy at biochem.uthscsa.edu
Tue Mar 15 15:49:54 MDT 2011
I recently added 4 additional nodes to our cluster and now queued jobs
refuse to start and say the requested number of procs in partition DEFAULT
has been exceeded.
3 Active Jobs 96 of 144 Processors Active (66.67%)
12 of 20 Nodes Active (60.00%)
Total Jobs: 16 Active Jobs: 3 Idle Jobs: 0 Blocked Jobs: 13
The 13 "blocked" jobs are requesting a mix 32 and 16 cpus. Obviously we
have enough CPUs left (96 out of 144 used), so I do not have a clue why
these 13 jobs remain blocked. I tried using qalter to lower the amount
requested, for example:
qalter -l nodes=2:ppn=4 4832
But jobid 4832 still says:
Holds: Defer
Messages: exceeds available partition procs
PE: 8.00 StartPriority: 119
cannot select job 4832 for partition DEFAULT (job hold active)
--
Jeremy Mann
jeremy at biochem.uthscsa.edu
University of Texas Health Science Center
Bioinformatics Core Facility
http://www.bioinformatics.uthscsa.edu
Phone: (210) 767-3419
^^^^^^^^
** NEW OFFICE NUMBER **
More information about the torqueusers
mailing list