[torqueusers] multiple jobs: yes, no, maybe

André Gemünd andre.gemuend at scai.fraunhofer.de
Fri Dec 21 00:25:30 MST 2012


Hi Jack,

----- Ursprüngliche Mail -----
> The user wanted to know if it was possible to have more than one job
> running on a node at a time? I honestly didn’t have an answer for
> him.

it is possible. If you use Maui, you can set NODEACCESSPOLICY to shared (allow multiple jobs) or singleuser (allow multiple jobs from same user). That way you can allow one job per core.
The problem is usually misbehaving software. A job can request only one core, but still use the whole node. E.g. codes that use OpenMP will usually use the whole machine.

The first step against that would be to set the node availability computation to combined (respecting utilized resources instead of only "reserved" resources): http://www.adaptivecomputing.com/resources/docs/maui/5.4nodeavailability.php
Then you can restrict the available resources of a job using CPUsets (http://www.adaptivecomputing.com/resources/docs/torque/2-5-12/help.htm#topics/3-nodes/linuxCpusetSupport.htm). 

> My feeling is that the whole purpose of the cluster is to give the
> power of a whole node to process a job and not have to share it.

Really depends on the workload. If you have many single core independent jobs (more HTC nature), you'll want to allow one job per core. You could set your queue resource default to allocate a whole node, but let users specifically request single cores, or even use a dedicated queue for that.

Greetings
André

-- 
André Gemünd
Fraunhofer-Institute for Algorithms and Scientific Computing
andre.gemuend at scai.fraunhofer.de
Tel: +49 2241 14-2193
/C=DE/O=Fraunhofer/OU=SCAI/OU=People/CN=Andre Gemuend


More information about the torqueusers mailing list