[torqueusers] Torque lets in unsatisfiable job requests

Bram Metsch metsch at ins.uni-bonn.de
Fri Apr 18 07:48:56 MDT 2008


Hi,

at our site, we run two clusters sharing the same frontend (which runs
Torque/Maui). In the first cluster, all nodes have 2 compute cores and 4 or 6GB
of RAM. The second cluster consists of nodes posessing 8 compute cores and 4 or
16GB of RAM.

The queues for these two clusters are completely seperated, e.g. the user has
to select the cluster of her/his choice by specifying the correct routing queue.
The job then hops to one of the executing queues (for example "short",
"medium" or "long"). For these execution queues, the mapping queue->nodes is
accomplished by Maui's PLIST parameter, e.g.

CLASSCFG[short]         PRIORITY=500    PLIST=bit64^

to specify that all jobs in the queue "short" can only be run on the nodes in
partition "bit64".

However, we watch that Torque still accepts job requests like these

1) The user queues her/his job into the queue serving the two-core nodes, but
     requests 4 or 8 processors per node
2) The user requests (per processor!) more memory than available on a single
      node.

Of coarse, maui is not able to create a reservation for these jobs. So, they
remain in the queue and they are never being executed. The users however think
their requests were correct because qsub did not produce an error message. Is
there a possibility to let Torque reject these requests already during
submission?

Best regards,

Bram.
-- 
Dipl. Math. Bram Metsch
Universitaet Bonn
Institut fuer Numerische Simulation
Wegelerstrasse 6
53115 Bonn
Germany
Phone: +49 228 733849
Fax:   +49 228 737527
http://wissrech.ins.uni-bonn.de/index.php4?nav=people_staff_metsch
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20080418/48218978/attachment.bin


More information about the torqueusers mailing list