[torqueusers] [Patch] GPUs by the way of GRES

Sean Reilly sean.reilly at ersa.edu.au
Fri Apr 20 00:16:00 MDT 2012


Hi Folks

   Here at eRSA we have just begun testing with this Patch - Great Work 
by  Jonathan Michalon !
   So far it is doing everything we need.

   Roland
                  - as you still need to lock the assigned GPU's down to 
a particular user and Job ID on the backend nodes
                  - we do this with cuda_wrapper
     so there is no real need for Maui to specify the particular gpu  eg 
gpu/2      or    gpu_2
    (apart form it just being a bit cleaner)

   We use both the Torque and Maui directives:
   torque                  #PBS -l gpus=1
   maui                    #PBS -W x=GRES:gpu at 1

   Maui side directive+Patch  takes care of the number of gpu's actually 
being available
   Torque gives you the environment variable  PBS_RESOURCE_GRES=gpus=1

     The prologue script is responsible for assigning an available gpu 
to this user and JobID.  via wrapper_init
     When the job finishes or is killed - epilogue release the gpu back 
into the pool.                via wrapper_terminate


     These two scripts should be aware of the gpus avail and in use at 
any time.
     - As Maui has ensured they should be available. *if not then the 
prologue and epilogue can send admins an Error email so it can be checked.*

     Its still early days for us - but so far so good.

     But yes it would be a nice if Maui could tell the backend nodes 
about the number of GPU's assigned (and possibly the device number) : 
eliminating the need for the extra #PBS -l gpus=1 setting.
     But not a show stopper.


Regards
Sean


On 06/03/12 04:36, rf at q-leap.de wrote:
>>>>>> "Jonathan" == Jonathan Michalon<jonathan.michalon at etu.unistra.fr>  writes:
> Hi Jonathan,
>
> while your patch adds some functionality to count allocated GPUs as
> a GRES, it lacks the important functionality to tell the job which GPUs
> are available for it. If latest torque 2.5.x is built with GPU support,
> you have the option to specify a nodes spec like "-l nodes=1:gpus=1" and
> within the running job you can check $GPUFILE what GPUs you're
> allocated. Now the problem is that a job with a "-l nodes=1:gpus=1"
> specification won't be started with maui even if it has your patch. On
> the other hand, using your "-W x=GRES:gpu at 1" spec (without a "-l
> nodes=1:gpus=1" spec) makes the job run, but
> it doesn't have an idea which GPU to use. Is there an easy way to extend
> your patch, so that maui will make a job run with the "-l
> nodes=1:gpus=1" spec?
>
> Cheers,
>
> Roland
>
>      Jonathan>  Hi Maui folks, GPUs in Maui are a long standing
>      Jonathan>  problem. Last year a patch was sent by Mariusz Mamoński
>      Jonathan>  [1], which works based on GRES parameters.  I've just made
>      Jonathan>  GPUs kind of working, by enhancing that patch. Please find
>      Jonathan>  attached the resulting patch, which works well for Maui
>      Jonathan>  3.3.1.  It defines a special GRES named "gpu" which works
>      Jonathan>  as expected on my test cases.
>
>      Jonathan>  Note that GRES behaviour seems quite confused as sometimes
>      Jonathan>  they are mentioned as consumable. This patch annihilates
>      Jonathan>  this behaviour, for the needs of GPUs.
>
>      Jonathan>  To use the patch: get the sources of maui-3.3.1 and patch
>      Jonathan>  them: patch -p1<  ../Patch-for-gpu-GRES.patch then compile
>      Jonathan>  as usual.
>
>      Jonathan>  You have to configure the GPUs in maui.cfg:
>      Jonathan>  NODECFG[nodename] GRES=gpu:2
>
>      Jonathan>  Then when queuing jobs you can request GPUs with (Torque
>      Jonathan>  syntax): qsub -W x=GRES:gpu at 1
>
>      Jonathan>  I hope this helps, please test this and enhance to your
>      Jonathan>  needs!
>
>      Jonathan>  [1]
>      Jonathan>  http://www.supercluster.org/pipermail/mauiusers/2011-April/004622.html
>
>      Jonathan>  Regards,
>
>      Jonathan>  PS. This is the second attempt to send the mail…
>
>      Jonathan>  -- Jonathan Michalon IT student in Strasbourg
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers


-- 
*Sean Reilly*

Systems Administrator & Applications Support Officer
eResearchSA
Phone : +61 8 8313 8352
Mobile: +61 450 840 246

<http://www.ersa.edu.au/moving>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20120420/5a189bd5/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Email-moved.png
Type: image/png
Size: 10004 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20120420/5a189bd5/attachment-0001.png 


More information about the torqueusers mailing list