[torqueusers] Torque lets in unsatisfiable job requests
Bram Metsch
metsch at ins.uni-bonn.de
Fri Apr 18 07:48:56 MDT 2008
Hi,
at our site, we run two clusters sharing the same frontend (which runs
Torque/Maui). In the first cluster, all nodes have 2 compute cores and 4 or 6GB
of RAM. The second cluster consists of nodes posessing 8 compute cores and 4 or
16GB of RAM.
The queues for these two clusters are completely seperated, e.g. the user has
to select the cluster of her/his choice by specifying the correct routing queue.
The job then hops to one of the executing queues (for example "short",
"medium" or "long"). For these execution queues, the mapping queue->nodes is
accomplished by Maui's PLIST parameter, e.g.
CLASSCFG[short] PRIORITY=500 PLIST=bit64^
to specify that all jobs in the queue "short" can only be run on the nodes in
partition "bit64".
However, we watch that Torque still accepts job requests like these
1) The user queues her/his job into the queue serving the two-core nodes, but
requests 4 or 8 processors per node
2) The user requests (per processor!) more memory than available on a single
node.
Of coarse, maui is not able to create a reservation for these jobs. So, they
remain in the queue and they are never being executed. The users however think
their requests were correct because qsub did not produce an error message. Is
there a possibility to let Torque reject these requests already during
submission?
Best regards,
Bram.
--
Dipl. Math. Bram Metsch
Universitaet Bonn
Institut fuer Numerische Simulation
Wegelerstrasse 6
53115 Bonn
Germany
Phone: +49 228 733849
Fax: +49 228 737527
http://wissrech.ins.uni-bonn.de/index.php4?nav=people_staff_metsch
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20080418/48218978/attachment.bin
More information about the torqueusers
mailing list