[torqueusers] Nodes not being allocated
Lippert, Kenneth B.
Kenneth.Lippert at alcoa.com
Fri Mar 17 08:50:02 MST 2006
I have been running a Torque 1.2 cluster for about a year now on Opteron
SuSE Linux machines. I have several 2 CPU machines and several 4 CPU
machines. When defining my nodes I specified how many processors per
node in each case (2 or 4). Each node also got a designation like
"production" or "experimental". I would submit with a
"-l nodes=1:production" (all jobs were always single CPU jobs). Jobs
would be correctly routed and run as expected, up to 4 jobs could run on
the 4 processor machines, up to 2 on the 2. Additional jobs would queue
as expected until a processor was available.
I recently upgraded to Torque 2.0 (p6) and installed a new master. The
new cluster is separate from the old, although I have moved some of the
old client machines to the new. My problem is that now even though the
queues and nodes are configured identically, jobs are not filling out
all of the processors on the nodes. My 2 CPU machine will only take
one job, even though it clearly has np=2. One of the 4 CPU machines
will take 2 jobs, but the other will take only 1. If I run a test with
"-l nodes=4" I will get the failure message that "there are not enough
resources to fulfill the request". I can do that same test on the old
1.2 cluster and it works fine, putting the job on the 4 headed machine.
I have examined every output from pbsnodes and qmgr and can see no
difference between the configurations of queues or nodes. I am using
the same version of Maui on both old and new clusters.
More information about the torqueusers