[torquedev] possible bug found

Jan Lindheim lindheim at cacr.caltech.edu
Mon Sep 14 18:20:13 MDT 2009

I sent the following bug report to the torqueusers list about a week ago,
but have not gotten any responses yet.  I'm posing it here as well, to
make sure the developers see it too.

We are running a setup of torque-2.3.6 and maui-3.2.6p21
and seem to have found some strange behavior, that looks like a bug.

We have a cluster of nodes with two CPUs, some are dual core and and
some are quad core opterons.  We have tagged the ones with two quad
core CPUs as "core8" and the ones with two dual core CPUs as "core4"
to describe total numbers of CPU cores in a node.

Both core8 and core4 nodes are available in the production queue.
A node in production, would have the tag "production" and a node
that have been taken out of production for testing, would get the
tag "system" instead.

With a few core4 nodes taken out of production, tagged with "system"
instead of "production", we see the following behavior:

qsub -I -l nodes=16:core4+1:core8

works as expected.  The nodes that has been tagged with "system"
are avoided.

If the following is tried:

qsub -I -l nodes=1:core8+16:core4

which should have given me the same, nodes that has been tagged
with "system", ends up in my allocation.

The default queue has the setting:
set queue productionQ resources_default.neednodes = production

which should require that nodes that are available for this queue,
have been tagged "production".

It looks like the logic is happening differently, depending on
which node class is listed first in the request.

To add to this:
qsub -I -l nodes=1:core8+16:core4
should have been the same as doing:
qsub -I -l nodes=1:core8:production+16:core4:production
but instead torque only treats it as:
qsub -I -l nodes=1:core8:production+16:core4
where the core4 nodes may be any core4 nodes.

Jan Lindheim

More information about the torquedev mailing list