[torqueusers] [Patch] GPUs by the way of GRES
sean.reilly at ersa.edu.au
Fri Apr 20 00:16:00 MDT 2012
Here at eRSA we have just begun testing with this Patch - Great Work
by Jonathan Michalon !
So far it is doing everything we need.
- as you still need to lock the assigned GPU's down to
a particular user and Job ID on the backend nodes
- we do this with cuda_wrapper
so there is no real need for Maui to specify the particular gpu eg
gpu/2 or gpu_2
(apart form it just being a bit cleaner)
We use both the Torque and Maui directives:
torque #PBS -l gpus=1
maui #PBS -W x=GRES:gpu at 1
Maui side directive+Patch takes care of the number of gpu's actually
Torque gives you the environment variable PBS_RESOURCE_GRES=gpus=1
The prologue script is responsible for assigning an available gpu
to this user and JobID. via wrapper_init
When the job finishes or is killed - epilogue release the gpu back
into the pool. via wrapper_terminate
These two scripts should be aware of the gpus avail and in use at
- As Maui has ensured they should be available. *if not then the
prologue and epilogue can send admins an Error email so it can be checked.*
Its still early days for us - but so far so good.
But yes it would be a nice if Maui could tell the backend nodes
about the number of GPU's assigned (and possibly the device number) :
eliminating the need for the extra #PBS -l gpus=1 setting.
But not a show stopper.
On 06/03/12 04:36, rf at q-leap.de wrote:
>>>>>> "Jonathan" == Jonathan Michalon<jonathan.michalon at etu.unistra.fr> writes:
> Hi Jonathan,
> while your patch adds some functionality to count allocated GPUs as
> a GRES, it lacks the important functionality to tell the job which GPUs
> are available for it. If latest torque 2.5.x is built with GPU support,
> you have the option to specify a nodes spec like "-l nodes=1:gpus=1" and
> within the running job you can check $GPUFILE what GPUs you're
> allocated. Now the problem is that a job with a "-l nodes=1:gpus=1"
> specification won't be started with maui even if it has your patch. On
> the other hand, using your "-W x=GRES:gpu at 1" spec (without a "-l
> nodes=1:gpus=1" spec) makes the job run, but
> it doesn't have an idea which GPU to use. Is there an easy way to extend
> your patch, so that maui will make a job run with the "-l
> nodes=1:gpus=1" spec?
> Jonathan> Hi Maui folks, GPUs in Maui are a long standing
> Jonathan> problem. Last year a patch was sent by Mariusz Mamoński
> Jonathan> , which works based on GRES parameters. I've just made
> Jonathan> GPUs kind of working, by enhancing that patch. Please find
> Jonathan> attached the resulting patch, which works well for Maui
> Jonathan> 3.3.1. It defines a special GRES named "gpu" which works
> Jonathan> as expected on my test cases.
> Jonathan> Note that GRES behaviour seems quite confused as sometimes
> Jonathan> they are mentioned as consumable. This patch annihilates
> Jonathan> this behaviour, for the needs of GPUs.
> Jonathan> To use the patch: get the sources of maui-3.3.1 and patch
> Jonathan> them: patch -p1< ../Patch-for-gpu-GRES.patch then compile
> Jonathan> as usual.
> Jonathan> You have to configure the GPUs in maui.cfg:
> Jonathan> NODECFG[nodename] GRES=gpu:2
> Jonathan> Then when queuing jobs you can request GPUs with (Torque
> Jonathan> syntax): qsub -W x=GRES:gpu at 1
> Jonathan> I hope this helps, please test this and enhance to your
> Jonathan> needs!
> Jonathan> 
> Jonathan> http://www.supercluster.org/pipermail/mauiusers/2011-April/004622.html
> Jonathan> Regards,
> Jonathan> PS. This is the second attempt to send the mail…
> Jonathan> -- Jonathan Michalon IT student in Strasbourg
> torqueusers mailing list
> torqueusers at supercluster.org
Systems Administrator & Applications Support Officer
Phone : +61 8 8313 8352
Mobile: +61 450 840 246
-------------- next part --------------
An HTML attachment was scrubbed...
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 10004 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20120420/5a189bd5/attachment-0001.png
More information about the torqueusers