[Mauiusers] Job remains queued for unpredictable amount of time
jd at shadlen.org
Thu Jan 1 22:00:35 MST 2009
I am still having the problem that jobs that request all nodes of my
small cluster stay queued for an unpredictable amount of time, but I
have some additional information in the meanwhile:
pbsnodes reports that all nodes (7 in the following example) are free.
"diagnose -n" says that all nodes are Idle.
showq says that the job is in IDLE state.
The job can be started with qrun.
checkjob says "cannot select job 38 for partition DEFAULT (startdate in
This is what I see in the Maui log file:
INFO: 7 feasible tasks found for job 38:0 in partition DEFAULT (7 Needed)
ALERT: inadequate tasks to allocate to job 38:0 (1 < 7)
ERROR: cannot allocate nodes to job '38' in partition DEFAULT
What does the second line mean?
Looking back through the mailing list archive I have found some other
emails that seem to be related to a similar problem, but I didn't find
any answers to these emails. Like these other users, I am using the
CPULOAD node allocation policy.
Here is what I am wondering: Could it be that Maui does not only use the
relative CPU load on the different nodes for deciding which nodes to
select, but that it also has an absolute threshold for the CPU load, and
if the current load exceeds this threshold the node will not be
allocated at all? If so, what is this threshold and can it be changed?
(I have already tried setting the node availability policy to UTILIZED,
which did not fix the problem.) The machines in my cluster are not pure
TORQUE compute nodes. Thus, there might be other processes running on
the machines, causing some CPU load. This is the main reason for
selecting a node allocation policy that would prefer nodes with low CPU
load for submitted TORQUE jobs.
Thanks for any tips you might have,
Jochen Ditterich, Ph.D.
Center for Neuroscience
University of California
1544 Newton Court
Davis, CA 95618
office: +1 (530) 754-5084
lab: +1 (530) 754-6987
fax: +1 (530) 757-8827
More information about the mauiusers